Lightweight locks should be adaptive
https://bugs.webkit.org/show_bug.cgi?id=147545
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
* dfg/DFGCommon.cpp:
(JSC::DFG::startCrashing):
* heap/CopiedBlock.h:
(JSC::CopiedBlock::workListLock):
* heap/CopiedBlockInlines.h:
(JSC::CopiedBlock::shouldReportLiveBytes):
(JSC::CopiedBlock::reportLiveBytes):
* heap/CopiedSpace.cpp:
(JSC::CopiedSpace::doneFillingBlock):
* heap/CopiedSpace.h:
(JSC::CopiedSpace::CopiedGeneration::CopiedGeneration):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::recycleEvacuatedBlock):
* heap/GCThreadSharedData.cpp:
(JSC::GCThreadSharedData::didStartCopying):
* heap/GCThreadSharedData.h:
(JSC::GCThreadSharedData::getNextBlocksToCopy):
* heap/ListableHandler.h:
(JSC::ListableHandler::List::addThreadSafe):
(JSC::ListableHandler::List::addNotThreadSafe):
* heap/MachineStackMarker.cpp:
(JSC::MachineThreads::tryCopyOtherThreadStacks):
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::copyLater):
* parser/SourceProvider.cpp:
(JSC::SourceProvider::~SourceProvider):
(JSC::SourceProvider::getID):
* profiler/ProfilerDatabase.cpp:
(JSC::Profiler::Database::addDatabaseToAtExit):
(JSC::Profiler::Database::removeDatabaseFromAtExit):
(JSC::Profiler::Database::removeFirstAtExitDatabase):
* runtime/TypeProfilerLog.h:
Source/WebCore:
* bindings/objc/WebScriptObject.mm:
(WebCore::getJSWrapper):
(WebCore::addJSWrapper):
(WebCore::removeJSWrapper):
(WebCore::removeJSWrapperIfRetainCountOne):
* platform/audio/mac/CARingBuffer.cpp:
(WebCore::CARingBuffer::setCurrentFrameBounds):
(WebCore::CARingBuffer::getCurrentFrameBounds):
* platform/audio/mac/CARingBuffer.h:
* platform/ios/wak/WAKWindow.mm:
(-[WAKWindow setExposedScrollViewRect:]):
(-[WAKWindow exposedScrollViewRect]):
Source/WebKit2:
* WebProcess/WebPage/EventDispatcher.cpp:
(WebKit::EventDispatcher::clearQueuedTouchEventsForPage):
(WebKit::EventDispatcher::getQueuedTouchEventsForPage):
(WebKit::EventDispatcher::touchEvent):
(WebKit::EventDispatcher::dispatchTouchEvents):
* WebProcess/WebPage/EventDispatcher.h:
* WebProcess/WebPage/ViewUpdateDispatcher.cpp:
(WebKit::ViewUpdateDispatcher::visibleContentRectUpdate):
(WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate):
* WebProcess/WebPage/ViewUpdateDispatcher.h:
Source/WTF:
A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition
overhead is lower than system locks and because they take dramatically less space than system
locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock
acquire is up to 10x faster and under microcontention - short critical section with two or
more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4
bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit
should continue to avoid system locks - they are just far too slow and far too big.
But there is a problem with this idiom. System lock implementations will sleep a thread when
it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU.
In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for
microcontention, but awful when the lock will not be released for a while. In fact, when
critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is
almost 100x more than the CPU time cost of a system lock. This case doesn't arise too
frequently in our current uses of spinlocks, but that's probably because right now there are
places where we make a conscious decision to use system locks - even though they use more
memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a
while to acquire the lock.
The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new
concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks
that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The
idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held,
the slow path tries some number of spins to acquire the lock, and if that fails, the thread is
put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and
the lock itself is a tagged pointer: either it is just bits telling us the complete lock state
(not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the
lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path
if the lock is not contended, a short burst of spinning for microcontention, and a full-blown
queue for critical sections that are held for a long time.
On a locking microbenchmark, this new Lock exhibits the following performance
characteristics:
- Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster
than a system mutex.
- Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster
than a system mutex.
- CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a
SpinLock.
- Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than
a SpinLock.
This patch replaces all uses of SpinLock with Lock, since our critical sections are not
no-ops so if you do basically anything in your critical section, the Lock overhead will be
invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead
instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is
as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time.
This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits
of having a lock that just uses a byte are still better than the CPU wastage benefits of
Lock. But, this work will enable some future work to create locks that will fit in just 1.6
bits: https://bugs.webkit.org/show_bug.cgi?id=147665.
Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak
CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if
there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the
possibility of a null queue head.
[1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf
* WTF.vcxproj/WTF.vcxproj:
* WTF.xcodeproj/project.pbxproj:
* benchmarks: Added.
* benchmarks/LockSpeedTest.cpp: Added.
(main):
* wtf/Atomics.h:
(WTF::Atomic::compareExchangeWeak):
(WTF::Atomic::compareExchangeStrong):
* wtf/CMakeLists.txt:
* wtf/Lock.cpp: Added.
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
* wtf/Lock.h: Added.
(WTF::LockBase::lock):
(WTF::LockBase::unlock):
(WTF::LockBase::isHeld):
(WTF::LockBase::isLocked):
(WTF::Lock::Lock):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocatorHandle::shrink):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
(WTF::MetaAllocator::addFreshFreeSpace):
(WTF::MetaAllocator::debugFreeSpaceSize):
* wtf/MetaAllocator.h:
* wtf/SpinLock.h:
* wtf/ThreadingPthreads.cpp:
* wtf/ThreadingWin.cpp:
* wtf/text/AtomicString.cpp:
* wtf/text/AtomicStringImpl.cpp:
(WTF::AtomicStringTableLocker::AtomicStringTableLocker):
Tools:
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/Lock.cpp: Added.
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/165908@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
|
|
|
/*
|
2019-09-18 00:36:19 +00:00
|
|
|
* Copyright (C) 2015-2019 Apple Inc. All rights reserved.
|
Lightweight locks should be adaptive
https://bugs.webkit.org/show_bug.cgi?id=147545
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
* dfg/DFGCommon.cpp:
(JSC::DFG::startCrashing):
* heap/CopiedBlock.h:
(JSC::CopiedBlock::workListLock):
* heap/CopiedBlockInlines.h:
(JSC::CopiedBlock::shouldReportLiveBytes):
(JSC::CopiedBlock::reportLiveBytes):
* heap/CopiedSpace.cpp:
(JSC::CopiedSpace::doneFillingBlock):
* heap/CopiedSpace.h:
(JSC::CopiedSpace::CopiedGeneration::CopiedGeneration):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::recycleEvacuatedBlock):
* heap/GCThreadSharedData.cpp:
(JSC::GCThreadSharedData::didStartCopying):
* heap/GCThreadSharedData.h:
(JSC::GCThreadSharedData::getNextBlocksToCopy):
* heap/ListableHandler.h:
(JSC::ListableHandler::List::addThreadSafe):
(JSC::ListableHandler::List::addNotThreadSafe):
* heap/MachineStackMarker.cpp:
(JSC::MachineThreads::tryCopyOtherThreadStacks):
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::copyLater):
* parser/SourceProvider.cpp:
(JSC::SourceProvider::~SourceProvider):
(JSC::SourceProvider::getID):
* profiler/ProfilerDatabase.cpp:
(JSC::Profiler::Database::addDatabaseToAtExit):
(JSC::Profiler::Database::removeDatabaseFromAtExit):
(JSC::Profiler::Database::removeFirstAtExitDatabase):
* runtime/TypeProfilerLog.h:
Source/WebCore:
* bindings/objc/WebScriptObject.mm:
(WebCore::getJSWrapper):
(WebCore::addJSWrapper):
(WebCore::removeJSWrapper):
(WebCore::removeJSWrapperIfRetainCountOne):
* platform/audio/mac/CARingBuffer.cpp:
(WebCore::CARingBuffer::setCurrentFrameBounds):
(WebCore::CARingBuffer::getCurrentFrameBounds):
* platform/audio/mac/CARingBuffer.h:
* platform/ios/wak/WAKWindow.mm:
(-[WAKWindow setExposedScrollViewRect:]):
(-[WAKWindow exposedScrollViewRect]):
Source/WebKit2:
* WebProcess/WebPage/EventDispatcher.cpp:
(WebKit::EventDispatcher::clearQueuedTouchEventsForPage):
(WebKit::EventDispatcher::getQueuedTouchEventsForPage):
(WebKit::EventDispatcher::touchEvent):
(WebKit::EventDispatcher::dispatchTouchEvents):
* WebProcess/WebPage/EventDispatcher.h:
* WebProcess/WebPage/ViewUpdateDispatcher.cpp:
(WebKit::ViewUpdateDispatcher::visibleContentRectUpdate):
(WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate):
* WebProcess/WebPage/ViewUpdateDispatcher.h:
Source/WTF:
A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition
overhead is lower than system locks and because they take dramatically less space than system
locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock
acquire is up to 10x faster and under microcontention - short critical section with two or
more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4
bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit
should continue to avoid system locks - they are just far too slow and far too big.
But there is a problem with this idiom. System lock implementations will sleep a thread when
it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU.
In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for
microcontention, but awful when the lock will not be released for a while. In fact, when
critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is
almost 100x more than the CPU time cost of a system lock. This case doesn't arise too
frequently in our current uses of spinlocks, but that's probably because right now there are
places where we make a conscious decision to use system locks - even though they use more
memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a
while to acquire the lock.
The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new
concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks
that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The
idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held,
the slow path tries some number of spins to acquire the lock, and if that fails, the thread is
put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and
the lock itself is a tagged pointer: either it is just bits telling us the complete lock state
(not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the
lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path
if the lock is not contended, a short burst of spinning for microcontention, and a full-blown
queue for critical sections that are held for a long time.
On a locking microbenchmark, this new Lock exhibits the following performance
characteristics:
- Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster
than a system mutex.
- Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster
than a system mutex.
- CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a
SpinLock.
- Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than
a SpinLock.
This patch replaces all uses of SpinLock with Lock, since our critical sections are not
no-ops so if you do basically anything in your critical section, the Lock overhead will be
invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead
instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is
as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time.
This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits
of having a lock that just uses a byte are still better than the CPU wastage benefits of
Lock. But, this work will enable some future work to create locks that will fit in just 1.6
bits: https://bugs.webkit.org/show_bug.cgi?id=147665.
Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak
CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if
there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the
possibility of a null queue head.
[1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf
* WTF.vcxproj/WTF.vcxproj:
* WTF.xcodeproj/project.pbxproj:
* benchmarks: Added.
* benchmarks/LockSpeedTest.cpp: Added.
(main):
* wtf/Atomics.h:
(WTF::Atomic::compareExchangeWeak):
(WTF::Atomic::compareExchangeStrong):
* wtf/CMakeLists.txt:
* wtf/Lock.cpp: Added.
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
* wtf/Lock.h: Added.
(WTF::LockBase::lock):
(WTF::LockBase::unlock):
(WTF::LockBase::isHeld):
(WTF::LockBase::isLocked):
(WTF::Lock::Lock):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocatorHandle::shrink):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
(WTF::MetaAllocator::addFreshFreeSpace):
(WTF::MetaAllocator::debugFreeSpaceSize):
* wtf/MetaAllocator.h:
* wtf/SpinLock.h:
* wtf/ThreadingPthreads.cpp:
* wtf/ThreadingWin.cpp:
* wtf/text/AtomicString.cpp:
* wtf/text/AtomicStringImpl.cpp:
(WTF::AtomicStringTableLocker::AtomicStringTableLocker):
Tools:
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/Lock.cpp: Added.
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/165908@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
|
|
|
|
* EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
|
|
|
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR
|
|
|
|
* CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
|
|
|
* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
|
|
|
* PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
|
|
|
* PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
|
|
|
|
* OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
|
|
|
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
|
|
|
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
|
|
*/
|
|
|
|
|
2018-10-15 14:24:49 +00:00
|
|
|
#pragma once
|
Lightweight locks should be adaptive
https://bugs.webkit.org/show_bug.cgi?id=147545
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
* dfg/DFGCommon.cpp:
(JSC::DFG::startCrashing):
* heap/CopiedBlock.h:
(JSC::CopiedBlock::workListLock):
* heap/CopiedBlockInlines.h:
(JSC::CopiedBlock::shouldReportLiveBytes):
(JSC::CopiedBlock::reportLiveBytes):
* heap/CopiedSpace.cpp:
(JSC::CopiedSpace::doneFillingBlock):
* heap/CopiedSpace.h:
(JSC::CopiedSpace::CopiedGeneration::CopiedGeneration):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::recycleEvacuatedBlock):
* heap/GCThreadSharedData.cpp:
(JSC::GCThreadSharedData::didStartCopying):
* heap/GCThreadSharedData.h:
(JSC::GCThreadSharedData::getNextBlocksToCopy):
* heap/ListableHandler.h:
(JSC::ListableHandler::List::addThreadSafe):
(JSC::ListableHandler::List::addNotThreadSafe):
* heap/MachineStackMarker.cpp:
(JSC::MachineThreads::tryCopyOtherThreadStacks):
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::copyLater):
* parser/SourceProvider.cpp:
(JSC::SourceProvider::~SourceProvider):
(JSC::SourceProvider::getID):
* profiler/ProfilerDatabase.cpp:
(JSC::Profiler::Database::addDatabaseToAtExit):
(JSC::Profiler::Database::removeDatabaseFromAtExit):
(JSC::Profiler::Database::removeFirstAtExitDatabase):
* runtime/TypeProfilerLog.h:
Source/WebCore:
* bindings/objc/WebScriptObject.mm:
(WebCore::getJSWrapper):
(WebCore::addJSWrapper):
(WebCore::removeJSWrapper):
(WebCore::removeJSWrapperIfRetainCountOne):
* platform/audio/mac/CARingBuffer.cpp:
(WebCore::CARingBuffer::setCurrentFrameBounds):
(WebCore::CARingBuffer::getCurrentFrameBounds):
* platform/audio/mac/CARingBuffer.h:
* platform/ios/wak/WAKWindow.mm:
(-[WAKWindow setExposedScrollViewRect:]):
(-[WAKWindow exposedScrollViewRect]):
Source/WebKit2:
* WebProcess/WebPage/EventDispatcher.cpp:
(WebKit::EventDispatcher::clearQueuedTouchEventsForPage):
(WebKit::EventDispatcher::getQueuedTouchEventsForPage):
(WebKit::EventDispatcher::touchEvent):
(WebKit::EventDispatcher::dispatchTouchEvents):
* WebProcess/WebPage/EventDispatcher.h:
* WebProcess/WebPage/ViewUpdateDispatcher.cpp:
(WebKit::ViewUpdateDispatcher::visibleContentRectUpdate):
(WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate):
* WebProcess/WebPage/ViewUpdateDispatcher.h:
Source/WTF:
A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition
overhead is lower than system locks and because they take dramatically less space than system
locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock
acquire is up to 10x faster and under microcontention - short critical section with two or
more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4
bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit
should continue to avoid system locks - they are just far too slow and far too big.
But there is a problem with this idiom. System lock implementations will sleep a thread when
it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU.
In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for
microcontention, but awful when the lock will not be released for a while. In fact, when
critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is
almost 100x more than the CPU time cost of a system lock. This case doesn't arise too
frequently in our current uses of spinlocks, but that's probably because right now there are
places where we make a conscious decision to use system locks - even though they use more
memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a
while to acquire the lock.
The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new
concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks
that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The
idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held,
the slow path tries some number of spins to acquire the lock, and if that fails, the thread is
put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and
the lock itself is a tagged pointer: either it is just bits telling us the complete lock state
(not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the
lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path
if the lock is not contended, a short burst of spinning for microcontention, and a full-blown
queue for critical sections that are held for a long time.
On a locking microbenchmark, this new Lock exhibits the following performance
characteristics:
- Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster
than a system mutex.
- Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster
than a system mutex.
- CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a
SpinLock.
- Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than
a SpinLock.
This patch replaces all uses of SpinLock with Lock, since our critical sections are not
no-ops so if you do basically anything in your critical section, the Lock overhead will be
invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead
instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is
as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time.
This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits
of having a lock that just uses a byte are still better than the CPU wastage benefits of
Lock. But, this work will enable some future work to create locks that will fit in just 1.6
bits: https://bugs.webkit.org/show_bug.cgi?id=147665.
Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak
CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if
there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the
possibility of a null queue head.
[1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf
* WTF.vcxproj/WTF.vcxproj:
* WTF.xcodeproj/project.pbxproj:
* benchmarks: Added.
* benchmarks/LockSpeedTest.cpp: Added.
(main):
* wtf/Atomics.h:
(WTF::Atomic::compareExchangeWeak):
(WTF::Atomic::compareExchangeStrong):
* wtf/CMakeLists.txt:
* wtf/Lock.cpp: Added.
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
* wtf/Lock.h: Added.
(WTF::LockBase::lock):
(WTF::LockBase::unlock):
(WTF::LockBase::isHeld):
(WTF::LockBase::isLocked):
(WTF::Lock::Lock):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocatorHandle::shrink):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
(WTF::MetaAllocator::addFreshFreeSpace):
(WTF::MetaAllocator::debugFreeSpaceSize):
* wtf/MetaAllocator.h:
* wtf/SpinLock.h:
* wtf/ThreadingPthreads.cpp:
* wtf/ThreadingWin.cpp:
* wtf/text/AtomicString.cpp:
* wtf/text/AtomicStringImpl.cpp:
(WTF::AtomicStringTableLocker::AtomicStringTableLocker):
Tools:
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/Lock.cpp: Added.
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/165908@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
|
|
|
|
Make CheckedLock the default Lock
https://bugs.webkit.org/show_bug.cgi?id=226157
Reviewed by Darin Adler.
Make CheckedLock the default Lock so that we get more benefits from Clang
Thread Safety Analysis. Note that CheckedLock 100% relies on the existing
Source/JavaScriptCore:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* assembler/testmasm.cpp:
* dfg/DFGCommon.cpp:
* dfg/DFGThreadData.h:
* dfg/DFGWorklist.cpp:
(JSC::DFG::Worklist::Worklist):
* dfg/DFGWorklist.h:
* dynbench.cpp:
* heap/BlockDirectory.h:
(JSC::BlockDirectory::bitvectorLock):
* heap/CodeBlockSet.h:
(JSC::CodeBlockSet::getLock):
* heap/Heap.cpp:
(JSC::Heap::Heap):
* heap/Heap.h:
* heap/MarkedSpace.h:
(JSC::MarkedSpace::directoryLock):
* heap/MarkingConstraintSolver.h:
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::donateKnownParallel):
* heap/SlotVisitor.h:
* jit/ExecutableAllocator.cpp:
(JSC::ExecutableAllocator::getLock const):
(JSC::dumpJITMemory):
* jit/ExecutableAllocator.h:
(JSC::ExecutableAllocatorBase::getLock const):
* jit/JITWorklist.cpp:
(JSC::JITWorklist::JITWorklist):
* jit/JITWorklist.h:
* jsc.cpp:
* profiler/ProfilerDatabase.h:
* runtime/ConcurrentJSLock.h:
* runtime/DeferredWorkTimer.h:
* runtime/JSLock.h:
* runtime/SamplingProfiler.cpp:
(JSC::FrameWalker::FrameWalker):
(JSC::CFrameWalker::CFrameWalker):
(JSC::SamplingProfiler::takeSample):
* runtime/SamplingProfiler.h:
(JSC::SamplingProfiler::getLock):
* runtime/VM.h:
* runtime/VMTraps.cpp:
(JSC::VMTraps::invalidateCodeBlocksOnStack):
(JSC::VMTraps::VMTraps):
* runtime/VMTraps.h:
* tools/FunctionOverrides.h:
* tools/VMInspector.cpp:
(JSC::ensureIsSafeToLock):
* tools/VMInspector.h:
(JSC::VMInspector::getLock):
* wasm/WasmCalleeRegistry.h:
(JSC::Wasm::CalleeRegistry::getLock):
* wasm/WasmPlan.h:
* wasm/WasmStreamingCompiler.h:
* wasm/WasmThunks.h:
* wasm/WasmWorklist.cpp:
(JSC::Wasm::Worklist::Worklist):
* wasm/WasmWorklist.h:
Source/WebCore:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* Modules/indexeddb/server/IDBServer.cpp:
* Modules/webaudio/MediaElementAudioSourceNode.h:
* Modules/webdatabase/OriginLock.cpp:
* bindings/js/JSDOMGlobalObject.h:
* dom/Node.cpp:
* html/HTMLMediaElement.cpp:
(WebCore::HTMLMediaElement::createMediaPlayer):
* html/canvas/WebGLContextGroup.cpp:
(WebCore::WebGLContextGroup::objectGraphLockForAContext):
* html/canvas/WebGLContextGroup.h:
* html/canvas/WebGLContextObject.cpp:
(WebCore::WebGLContextObject::objectGraphLockForContext):
* html/canvas/WebGLContextObject.h:
* html/canvas/WebGLObject.h:
* html/canvas/WebGLRenderingContextBase.cpp:
(WebCore::WebGLRenderingContextBase::objectGraphLock):
* html/canvas/WebGLRenderingContextBase.h:
* html/canvas/WebGLSharedObject.cpp:
(WebCore::WebGLSharedObject::objectGraphLockForContext):
* html/canvas/WebGLSharedObject.h:
* page/scrolling/mac/ScrollingTreeMac.h:
* platform/audio/ReverbConvolver.cpp:
(WebCore::ReverbConvolver::backgroundThreadEntry):
* platform/graphics/ShadowBlur.cpp:
(WebCore::ScratchBuffer::lock):
(WebCore::ShadowBlur::drawRectShadowWithTiling):
(WebCore::ShadowBlur::drawInsetShadowWithTiling):
* platform/graphics/gstreamer/VideoSinkGStreamer.cpp:
* platform/graphics/gstreamer/eme/WebKitCommonEncryptionDecryptorGStreamer.cpp:
Source/WebKit:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* GPUProcess/graphics/RemoteGraphicsContextGL.cpp:
(WebKit::RemoteGraphicsContextGL::paintPixelBufferToImageBuffer):
* NetworkProcess/IndexedDB/WebIDBServer.cpp:
* UIProcess/API/glib/IconDatabase.h:
* UIProcess/mac/WKPrintingView.mm:
(-[WKPrintingView knowsPageRange:]):
Source/WTF:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* wtf/AutomaticThread.cpp:
(WTF::AutomaticThreadCondition::wait):
(WTF::AutomaticThreadCondition::waitFor):
(WTF::AutomaticThread::AutomaticThread):
* wtf/AutomaticThread.h:
* wtf/CheckedCondition.h:
* wtf/CheckedLock.h:
* wtf/Condition.h:
* wtf/Lock.cpp:
(WTF::UncheckedLock::lockSlow):
(WTF::UncheckedLock::unlockSlow):
(WTF::UncheckedLock::unlockFairlySlow):
(WTF::UncheckedLock::safepointSlow):
* wtf/Lock.h:
(WTF::WTF_ASSERTS_ACQUIRED_LOCK):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocator::MetaAllocator):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
* wtf/MetaAllocator.h:
* wtf/ParallelHelperPool.cpp:
(WTF::ParallelHelperPool::ParallelHelperPool):
* wtf/ParallelHelperPool.h:
* wtf/RecursiveLockAdapter.h:
* wtf/WorkerPool.cpp:
(WTF::WorkerPool::WorkerPool):
* wtf/WorkerPool.h:
Tools:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* TestWebKitAPI/Tests/WTF/CheckedConditionTest.cpp:
* TestWebKitAPI/Tests/WTF/Condition.cpp:
* TestWebKitAPI/Tests/WTF/MetaAllocator.cpp:
* WebKitTestRunner/InjectedBundle/AccessibilityController.cpp:
(WTR::AXThread::createThreadIfNeeded):
Canonical link: https://commits.webkit.org/238070@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277943 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-24 05:37:41 +00:00
|
|
|
#include <mutex>
|
The GC should be optionally concurrent and disabled by default
https://bugs.webkit.org/show_bug.cgi?id=164454
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
This started out as a patch to have the GC scan the stack at the end, and then the
outage happened and I decided to pick a more aggresive target: give the GC a concurrent
mode that can be enabled at runtime, and whose only effect is that it turns on the
ResumeTheWorldScope. This gives our GC a really intuitive workflow: by default, the GC
thread is running solo with the world stopped and the parallel markers converged and
waiting. We have a parallel work scope to enable the parallel markers and now we have a
ResumeTheWorldScope that will optionally resume the world and then stop it again.
It's easy to make a concurrent GC that always instantly crashes. I can't promise that
this one won't do that when you run it. I set a specific goal: I wanted to do >10
concurrent GCs in debug mode with generations, optimizing JITs, and parallel marking
disabled.
To reach this milestone, I needed to do a bunch of stuff:
- The mutator needs a separate mark stack for the barrier, since it will mutate this
stack concurrently to the collector's slot visitors.
- The use of CellState to indicate whether an object is being scanned the first time or
a subsequent time was racy. It fails spectacularly when a barrier is fired at the same
time as visitChildren is running or if the barrier runs at the same time as the GC
marks the same object. So, I split SlotVisitor's mark stacks. It's now the case that
you know why you're being scanned by looking at which stack you came off of.
- All of root marking must be in the collector fixpoint. I renamed markRoots to
markToFixpoint. They say concurrency is hard, but the collector looks more intuitive
this way. We never gained anything from forcing people to make a choice between
scanning something in the fixpoint versus outside of it. Because root scanning is
cheap, we can afford to do it repeatedly, which means all root scanning can now do
constraint-based marking (like: I'll mark you if that thing is marked).
- JSObject::visitChildren's scanning of the butterfly raced with property additions,
indexed storage transitions and resizing, and a bunch of miscellaneous dirty butterfly
reshaping functions - like the one that flattens a dictionary and some sneaky
ArrayStorage transformations. Many of these can be fixed by using store-store fences
in the mutator and load-load fences in the collector. I've adopted the rule that the
collector must always see either a butterfly and structure that match or a newer
butterfly with an older structure, where their age is just one transition apart. This
can be achieved with fences. For the cases where it breaks down, I added a lock to
every JSCell. This is a full-fledged WTF lock that we sneak into two available bits in
the indexingType. See the WTF ChangeLog for details.
The mutator fencing rules are as follows:
- Store-store fence before and after setting the butterfly.
- Store-store fence before setting structure if you had changed the shape of the
butterfly.
- Store-store fence after initializing all fields in an allocation.
- A dictionary Structure can change in strange ways while the GC is trying to scan it.
So, JSObject::visitChildren will now grab the object's structure's lock if the
object's structure is a dictionary. Dictionary structures are 1:1 with their object,
so this does not reduce GC parallelism (super unlikely that the GC will simultaneously
scan an object from two threads).
- The GC can blow away a Structure's property table at any time. As a small consolation,
it's now holding the Structure's lock when it does so. But there was tons of code in
Structure that uses DeferGC to prevent the GC from blowing away the property table.
This doesn't work with concurrent GC, since DeferGC only means that the GC won't run
its safepoint (i.e. stop-the-world code) in the DeferGC region. It will still do
marking and it was the Structure::visitChildren that would delete the table. It turns
out that Structure's reliance on the property table not being deleted was the product
of code rot. We already had functions that would materialize the table on demand. We
were simply making the mistake of saying:
structure->materializePropertyMap();
...
structure->propertyTable()->things
Instead of saying:
PropertyTable* table = structure->ensurePropertyTable();
...
table->things
Switching the code to use the latter idiom allowed me to simplify the code a lot while
fixing the race.
- The LLInt's get_by_val handling was broken because the indexing shape constants were
wrong. Once I started putting more things into the IndexingType, that started causing
crashes for me. So I fixed LLInt. That turned out to be a lot of work, since that code
had rotted in subtle ways.
This is a speed-up in SunSpider, probably because of the LLInt fix. This is neutral on
Octane and Kraken. It's a smaller slow-down on LongSpider, but I think we can ignore
that (we don't view LongSpider as an official benchmark). By default, the concurrent GC
is disabled: in all of the places where it would have resumed the world to run marking
concurrently to the mutator, it will just skip the resume step. When you enable
concurrent GC (--useConcurrentGC=true), it can sometimes run Octane/splay to completion.
It seems to perform quite well: on my machine, it improves both splay-throughput and
splay-latency. It's probably unstable for other programs.
* API/JSVirtualMachine.mm:
(-[JSVirtualMachine isOldExternalObject:]):
* assembler/MacroAssemblerARMv7.h:
(JSC::MacroAssemblerARMv7::storeFence):
* bytecode/InlineAccess.cpp:
(JSC::InlineAccess::dumpCacheSizesAndCrash):
(JSC::InlineAccess::generateSelfPropertyAccess):
(JSC::InlineAccess::generateArrayLength):
* bytecode/ObjectAllocationProfile.h:
(JSC::ObjectAllocationProfile::offsetOfInlineCapacity):
(JSC::ObjectAllocationProfile::ObjectAllocationProfile):
(JSC::ObjectAllocationProfile::initialize):
(JSC::ObjectAllocationProfile::inlineCapacity):
(JSC::ObjectAllocationProfile::clear):
* bytecode/PolymorphicAccess.cpp:
(JSC::AccessCase::generateWithGuard):
(JSC::AccessCase::generateImpl):
* dfg/DFGArrayifySlowPathGenerator.h:
* dfg/DFGClobberize.h:
(JSC::DFG::clobberize):
* dfg/DFGOSRExitCompiler32_64.cpp:
(JSC::DFG::OSRExitCompiler::compileExit):
* dfg/DFGOSRExitCompiler64.cpp:
(JSC::DFG::OSRExitCompiler::compileExit):
* dfg/DFGOperations.cpp:
* dfg/DFGPlan.cpp:
(JSC::DFG::Plan::markCodeBlocks):
(JSC::DFG::Plan::rememberCodeBlocks):
* dfg/DFGPlan.h:
* dfg/DFGSpeculativeJIT.cpp:
(JSC::DFG::SpeculativeJIT::emitAllocateRawObject):
(JSC::DFG::SpeculativeJIT::checkArray):
(JSC::DFG::SpeculativeJIT::arrayify):
(JSC::DFG::SpeculativeJIT::compileMakeRope):
(JSC::DFG::SpeculativeJIT::compileNewFunctionCommon):
(JSC::DFG::SpeculativeJIT::compileCreateActivation):
(JSC::DFG::SpeculativeJIT::compileCreateDirectArguments):
(JSC::DFG::SpeculativeJIT::compileSpread):
(JSC::DFG::SpeculativeJIT::compileAllocatePropertyStorage):
(JSC::DFG::SpeculativeJIT::compileReallocatePropertyStorage):
(JSC::DFG::SpeculativeJIT::compileNewStringObject):
(JSC::DFG::SpeculativeJIT::compileNewTypedArray):
(JSC::DFG::SpeculativeJIT::compileStoreBarrier):
* dfg/DFGSpeculativeJIT64.cpp:
(JSC::DFG::SpeculativeJIT::compile):
(JSC::DFG::SpeculativeJIT::compileAllocateNewArrayWithSize):
* dfg/DFGTierUpCheckInjectionPhase.cpp:
(JSC::DFG::TierUpCheckInjectionPhase::run):
* dfg/DFGWorklist.cpp:
(JSC::DFG::Worklist::markCodeBlocks):
(JSC::DFG::Worklist::rememberCodeBlocks):
(JSC::DFG::markCodeBlocks):
(JSC::DFG::completeAllPlansForVM):
(JSC::DFG::rememberCodeBlocks):
* dfg/DFGWorklist.h:
* ftl/FTLAbstractHeapRepository.cpp:
(JSC::FTL::AbstractHeapRepository::AbstractHeapRepository):
(JSC::FTL::AbstractHeapRepository::computeRangesAndDecorateInstructions):
* ftl/FTLAbstractHeapRepository.h:
* ftl/FTLJITCode.cpp:
(JSC::FTL::JITCode::~JITCode):
* ftl/FTLLowerDFGToB3.cpp:
(JSC::FTL::DFG::LowerDFGToB3::compilePutStructure):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateActivation):
(JSC::FTL::DFG::LowerDFGToB3::compileNewFunction):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateDirectArguments):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateRest):
(JSC::FTL::DFG::LowerDFGToB3::compileNewObject):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArray):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSpread):
(JSC::FTL::DFG::LowerDFGToB3::compileSpread):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayBuffer):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSize):
(JSC::FTL::DFG::LowerDFGToB3::compileNewTypedArray):
(JSC::FTL::DFG::LowerDFGToB3::compileMakeRope):
(JSC::FTL::DFG::LowerDFGToB3::compileMultiPutByOffset):
(JSC::FTL::DFG::LowerDFGToB3::compileMaterializeNewObject):
(JSC::FTL::DFG::LowerDFGToB3::compileMaterializeCreateActivation):
(JSC::FTL::DFG::LowerDFGToB3::splatWords):
(JSC::FTL::DFG::LowerDFGToB3::allocatePropertyStorage):
(JSC::FTL::DFG::LowerDFGToB3::reallocatePropertyStorage):
(JSC::FTL::DFG::LowerDFGToB3::allocateObject):
(JSC::FTL::DFG::LowerDFGToB3::isArrayType):
(JSC::FTL::DFG::LowerDFGToB3::emitStoreBarrier):
(JSC::FTL::DFG::LowerDFGToB3::mutatorFence):
(JSC::FTL::DFG::LowerDFGToB3::setButterfly):
* ftl/FTLOSRExitCompiler.cpp:
(JSC::FTL::compileStub):
* ftl/FTLOutput.cpp:
(JSC::FTL::Output::signExt32ToPtr):
(JSC::FTL::Output::fence):
* ftl/FTLOutput.h:
* heap/CellState.h:
* heap/GCSegmentedArray.h:
* heap/Heap.cpp:
(JSC::Heap::ResumeTheWorldScope::ResumeTheWorldScope):
(JSC::Heap::ResumeTheWorldScope::~ResumeTheWorldScope):
(JSC::Heap::Heap):
(JSC::Heap::~Heap):
(JSC::Heap::harvestWeakReferences):
(JSC::Heap::finalizeUnconditionalFinalizers):
(JSC::Heap::completeAllJITPlans):
(JSC::Heap::markToFixpoint):
(JSC::Heap::gatherStackRoots):
(JSC::Heap::beginMarking):
(JSC::Heap::visitConservativeRoots):
(JSC::Heap::visitCompilerWorklistWeakReferences):
(JSC::Heap::updateObjectCounts):
(JSC::Heap::endMarking):
(JSC::Heap::addToRememberedSet):
(JSC::Heap::collectInThread):
(JSC::Heap::stopTheWorld):
(JSC::Heap::resumeTheWorld):
(JSC::Heap::setGCDidJIT):
(JSC::Heap::setNeedFinalize):
(JSC::Heap::setMutatorWaiting):
(JSC::Heap::clearMutatorWaiting):
(JSC::Heap::finalize):
(JSC::Heap::flushWriteBarrierBuffer):
(JSC::Heap::writeBarrierSlowPath):
(JSC::Heap::canCollect):
(JSC::Heap::reportExtraMemoryVisited):
(JSC::Heap::reportExternalMemoryVisited):
(JSC::Heap::notifyIsSafeToCollect):
(JSC::Heap::markRoots): Deleted.
(JSC::Heap::visitExternalRememberedSet): Deleted.
(JSC::Heap::visitSmallStrings): Deleted.
(JSC::Heap::visitProtectedObjects): Deleted.
(JSC::Heap::visitArgumentBuffers): Deleted.
(JSC::Heap::visitException): Deleted.
(JSC::Heap::visitStrongHandles): Deleted.
(JSC::Heap::visitHandleStack): Deleted.
(JSC::Heap::visitSamplingProfiler): Deleted.
(JSC::Heap::visitTypeProfiler): Deleted.
(JSC::Heap::visitShadowChicken): Deleted.
(JSC::Heap::traceCodeBlocksAndJITStubRoutines): Deleted.
(JSC::Heap::visitWeakHandles): Deleted.
(JSC::Heap::flushOldStructureIDTables): Deleted.
(JSC::Heap::stopAllocation): Deleted.
* heap/Heap.h:
(JSC::Heap::collectorSlotVisitor):
(JSC::Heap::mutatorMarkStack):
(JSC::Heap::mutatorShouldBeFenced):
(JSC::Heap::addressOfMutatorShouldBeFenced):
(JSC::Heap::slotVisitor): Deleted.
(JSC::Heap::notifyIsSafeToCollect): Deleted.
(JSC::Heap::barrierShouldBeFenced): Deleted.
(JSC::Heap::addressOfBarrierShouldBeFenced): Deleted.
* heap/MarkStack.cpp:
(JSC::MarkStackArray::transferTo):
* heap/MarkStack.h:
* heap/MarkedAllocator.cpp:
(JSC::MarkedAllocator::tryAllocateIn):
* heap/MarkedBlock.cpp:
(JSC::MarkedBlock::MarkedBlock):
(JSC::MarkedBlock::Handle::specializedSweep):
(JSC::MarkedBlock::Handle::sweep):
(JSC::MarkedBlock::Handle::sweepHelperSelectMarksMode):
(JSC::MarkedBlock::Handle::stopAllocating):
(JSC::MarkedBlock::Handle::resumeAllocating):
(JSC::MarkedBlock::aboutToMarkSlow):
(JSC::MarkedBlock::Handle::didConsumeFreeList):
(JSC::SetNewlyAllocatedFunctor::SetNewlyAllocatedFunctor): Deleted.
(JSC::SetNewlyAllocatedFunctor::operator()): Deleted.
* heap/MarkedBlock.h:
* heap/MarkedSpace.cpp:
(JSC::MarkedSpace::resumeAllocating):
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::SlotVisitor):
(JSC::SlotVisitor::~SlotVisitor):
(JSC::SlotVisitor::reset):
(JSC::SlotVisitor::clearMarkStacks):
(JSC::SlotVisitor::appendJSCellOrAuxiliary):
(JSC::SlotVisitor::setMarkedAndAppendToMarkStack):
(JSC::SlotVisitor::appendToMarkStack):
(JSC::SlotVisitor::appendToMutatorMarkStack):
(JSC::SlotVisitor::visitChildren):
(JSC::SlotVisitor::donateKnownParallel):
(JSC::SlotVisitor::drain):
(JSC::SlotVisitor::drainFromShared):
(JSC::SlotVisitor::containsOpaqueRoot):
(JSC::SlotVisitor::donateAndDrain):
(JSC::SlotVisitor::mergeOpaqueRoots):
(JSC::SlotVisitor::dump):
(JSC::SlotVisitor::clearMarkStack): Deleted.
(JSC::SlotVisitor::opaqueRootCount): Deleted.
* heap/SlotVisitor.h:
(JSC::SlotVisitor::collectorMarkStack):
(JSC::SlotVisitor::mutatorMarkStack):
(JSC::SlotVisitor::isEmpty):
(JSC::SlotVisitor::bytesVisited):
(JSC::SlotVisitor::markStack): Deleted.
(JSC::SlotVisitor::bytesCopied): Deleted.
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::reportExtraMemoryVisited):
(JSC::SlotVisitor::reportExternalMemoryVisited):
* jit/AssemblyHelpers.cpp:
(JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo):
* jit/AssemblyHelpers.h:
(JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo):
(JSC::AssemblyHelpers::barrierStoreLoadFence):
(JSC::AssemblyHelpers::mutatorFence):
(JSC::AssemblyHelpers::storeButterfly):
(JSC::AssemblyHelpers::jumpIfMutatorFenceNotNeeded):
(JSC::AssemblyHelpers::emitInitializeInlineStorage):
(JSC::AssemblyHelpers::emitInitializeOutOfLineStorage):
(JSC::AssemblyHelpers::jumpIfBarrierStoreLoadFenceNotNeeded): Deleted.
* jit/JITInlines.h:
(JSC::JIT::emitArrayProfilingSiteWithCell):
* jit/JITOperations.cpp:
* jit/JITPropertyAccess.cpp:
(JSC::JIT::emit_op_put_to_scope):
(JSC::JIT::emit_op_put_to_arguments):
* llint/LLIntData.cpp:
(JSC::LLInt::Data::performAssertions):
* llint/LowLevelInterpreter.asm:
* llint/LowLevelInterpreter64.asm:
* runtime/ButterflyInlines.h:
(JSC::Butterfly::create):
(JSC::Butterfly::createOrGrowPropertyStorage):
* runtime/ConcurrentJITLock.h:
(JSC::GCSafeConcurrentJITLocker::NoDefer::NoDefer): Deleted.
* runtime/GenericArgumentsInlines.h:
(JSC::GenericArguments<Type>::getOwnPropertySlotByIndex):
(JSC::GenericArguments<Type>::putByIndex):
* runtime/IndexingType.h:
* runtime/JSArray.cpp:
(JSC::JSArray::unshiftCountSlowCase):
(JSC::JSArray::unshiftCountWithArrayStorage):
* runtime/JSCell.h:
(JSC::JSCell::InternalLocker::InternalLocker):
(JSC::JSCell::InternalLocker::~InternalLocker):
(JSC::JSCell::atomicCompareExchangeCellStateWeakRelaxed):
(JSC::JSCell::atomicCompareExchangeCellStateStrong):
(JSC::JSCell::indexingTypeAndMiscOffset):
(JSC::JSCell::indexingTypeOffset): Deleted.
* runtime/JSCellInlines.h:
(JSC::JSCell::JSCell):
(JSC::JSCell::finishCreation):
(JSC::JSCell::indexingTypeAndMisc):
(JSC::JSCell::indexingType):
(JSC::JSCell::setStructure):
(JSC::JSCell::callDestructor):
(JSC::JSCell::lockInternalLock):
(JSC::JSCell::unlockInternalLock):
* runtime/JSObject.cpp:
(JSC::JSObject::visitButterfly):
(JSC::JSObject::visitChildren):
(JSC::JSFinalObject::visitChildren):
(JSC::JSObject::enterDictionaryIndexingModeWhenArrayStorageAlreadyExists):
(JSC::JSObject::createInitialUndecided):
(JSC::JSObject::createInitialInt32):
(JSC::JSObject::createInitialDouble):
(JSC::JSObject::createInitialContiguous):
(JSC::JSObject::createArrayStorage):
(JSC::JSObject::convertUndecidedToArrayStorage):
(JSC::JSObject::convertInt32ToArrayStorage):
(JSC::JSObject::convertDoubleToArrayStorage):
(JSC::JSObject::convertContiguousToArrayStorage):
(JSC::JSObject::deleteProperty):
(JSC::JSObject::defineOwnIndexedProperty):
(JSC::JSObject::increaseVectorLength):
(JSC::JSObject::ensureLengthSlow):
(JSC::JSObject::reallocateAndShrinkButterfly):
(JSC::JSObject::allocateMoreOutOfLineStorage):
(JSC::JSObject::shiftButterflyAfterFlattening):
(JSC::JSObject::growOutOfLineStorage): Deleted.
* runtime/JSObject.h:
(JSC::JSFinalObject::JSFinalObject):
(JSC::JSObject::setButterfly):
(JSC::JSObject::getOwnNonIndexPropertySlot):
(JSC::JSObject::fillCustomGetterPropertySlot):
(JSC::JSObject::getOwnPropertySlot):
(JSC::JSObject::getPropertySlot):
(JSC::JSObject::setStructureAndButterfly): Deleted.
(JSC::JSObject::setButterflyWithoutChangingStructure): Deleted.
(JSC::JSObject::putDirectInternal): Deleted.
(JSC::JSObject::putDirectWithoutTransition): Deleted.
* runtime/JSObjectInlines.h:
(JSC::JSObject::getPropertySlot):
(JSC::JSObject::getNonIndexPropertySlot):
(JSC::JSObject::putDirectWithoutTransition):
(JSC::JSObject::putDirectInternal):
* runtime/Options.h:
* runtime/SparseArrayValueMap.h:
* runtime/Structure.cpp:
(JSC::Structure::dumpStatistics):
(JSC::Structure::findStructuresAndMapForMaterialization):
(JSC::Structure::materializePropertyTable):
(JSC::Structure::addNewPropertyTransition):
(JSC::Structure::changePrototypeTransition):
(JSC::Structure::attributeChangeTransition):
(JSC::Structure::toDictionaryTransition):
(JSC::Structure::takePropertyTableOrCloneIfPinned):
(JSC::Structure::nonPropertyTransition):
(JSC::Structure::isSealed):
(JSC::Structure::isFrozen):
(JSC::Structure::flattenDictionaryStructure):
(JSC::Structure::pin):
(JSC::Structure::pinForCaching):
(JSC::Structure::willStoreValueSlow):
(JSC::Structure::copyPropertyTableForPinning):
(JSC::Structure::add):
(JSC::Structure::remove):
(JSC::Structure::getPropertyNamesFromStructure):
(JSC::Structure::visitChildren):
(JSC::Structure::materializePropertyMap): Deleted.
(JSC::Structure::addPropertyWithoutTransition): Deleted.
(JSC::Structure::removePropertyWithoutTransition): Deleted.
(JSC::Structure::copyPropertyTable): Deleted.
(JSC::Structure::createPropertyMap): Deleted.
(JSC::PropertyTable::checkConsistency): Deleted.
(JSC::Structure::checkConsistency): Deleted.
* runtime/Structure.h:
* runtime/StructureIDBlob.h:
(JSC::StructureIDBlob::StructureIDBlob):
(JSC::StructureIDBlob::indexingTypeIncludingHistory):
(JSC::StructureIDBlob::setIndexingTypeIncludingHistory):
(JSC::StructureIDBlob::indexingTypeIncludingHistoryOffset):
(JSC::StructureIDBlob::indexingType): Deleted.
(JSC::StructureIDBlob::setIndexingType): Deleted.
(JSC::StructureIDBlob::indexingTypeOffset): Deleted.
* runtime/StructureInlines.h:
(JSC::Structure::get):
(JSC::Structure::checkOffsetConsistency):
(JSC::Structure::checkConsistency):
(JSC::Structure::add):
(JSC::Structure::remove):
(JSC::Structure::addPropertyWithoutTransition):
(JSC::Structure::removePropertyWithoutTransition):
(JSC::Structure::setPropertyTable):
(JSC::Structure::putWillGrowOutOfLineStorage): Deleted.
(JSC::Structure::propertyTable): Deleted.
(JSC::Structure::suggestedNewOutOfLineStorageCapacity): Deleted.
Source/WTF:
The reason why I went to such great pains to make WTF::Lock fit in two bits is that I
knew that I would eventually need to stuff one into some miscellaneous bits of the
JSCell header. That time has come, because the concurrent GC has numerous race
conditions in visitChildren that can be trivially fixed if each object just has an
internal lock. Some cell types might use it to simply protect their entire visitChildren
function and anything that mutates the fields it touches, while other cell types might
use it as a "lock of last resort" to handle corner cases of an otherwise wait-free or
lock-free algorithm. Right now, it's used to protect certain transformations involving
indexing storage.
To make this happen, I factored the WTF::Lock algorithm into a LockAlgorithm struct that
is templatized on lock type (uint8_t for WTF::Lock), the isHeldBit value (1 for
WTF::Lock), and the hasParkedBit value (2 for WTF::Lock). This could have been done as
a templatized Lock class that basically contains Atomic<LockType>. You could then make
any field into a lock by bitwise_casting it to TemplateLock<field type, bit1, bit2>. But
this felt too dirty, so instead, LockAlgorithm has static methods that take
Atomic<LockType>& as their first argument. I think that this makes it more natural to
project a LockAlgorithm onto an existing Atomic<> field. Sadly, some places have to cast
their non-Atomic<> field to Atomic<> in order for this to work. Like so many other things
we do, this just shows that the C++ style of labeling fields that are subject to atomic
ops as atomic is counterproductive. Maybe some day I'll change LockAlgorithm to use our
other Atomics API, which does not require Atomic<>.
WTF::Lock now uses LockAlgorithm. The slow paths are still outlined. I don't feel too
bad about the LockAlgorithm.h header being included in so many places because we change
that algorithm so infrequently.
Also, I added a hasElapsed(time) function. This function makes it so much more natural
to write timeslicing code, which the concurrent GC has to do a lot of.
* WTF.xcodeproj/project.pbxproj:
* wtf/CMakeLists.txt:
* wtf/ListDump.h:
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
(WTF::LockBase::unlockFairlySlow):
(WTF::LockBase::unlockSlowImpl): Deleted.
* wtf/Lock.h:
(WTF::LockBase::lock):
(WTF::LockBase::tryLock):
(WTF::LockBase::unlock):
(WTF::LockBase::unlockFairly):
(WTF::LockBase::isHeld):
(): Deleted.
* wtf/LockAlgorithm.h: Added.
(WTF::LockAlgorithm::lockFastAssumingZero):
(WTF::LockAlgorithm::lockFast):
(WTF::LockAlgorithm::lock):
(WTF::LockAlgorithm::tryLock):
(WTF::LockAlgorithm::unlockFastAssumingZero):
(WTF::LockAlgorithm::unlockFast):
(WTF::LockAlgorithm::unlock):
(WTF::LockAlgorithm::unlockFairly):
(WTF::LockAlgorithm::isLocked):
(WTF::LockAlgorithm::lockSlow):
(WTF::LockAlgorithm::unlockSlow):
* wtf/TimeWithDynamicClockType.cpp:
(WTF::hasElapsed):
* wtf/TimeWithDynamicClockType.h:
Canonical link: https://commits.webkit.org/182434@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@208720 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-11-15 01:49:22 +00:00
|
|
|
#include <wtf/LockAlgorithm.h>
|
Lightweight locks should be adaptive
https://bugs.webkit.org/show_bug.cgi?id=147545
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
* dfg/DFGCommon.cpp:
(JSC::DFG::startCrashing):
* heap/CopiedBlock.h:
(JSC::CopiedBlock::workListLock):
* heap/CopiedBlockInlines.h:
(JSC::CopiedBlock::shouldReportLiveBytes):
(JSC::CopiedBlock::reportLiveBytes):
* heap/CopiedSpace.cpp:
(JSC::CopiedSpace::doneFillingBlock):
* heap/CopiedSpace.h:
(JSC::CopiedSpace::CopiedGeneration::CopiedGeneration):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::recycleEvacuatedBlock):
* heap/GCThreadSharedData.cpp:
(JSC::GCThreadSharedData::didStartCopying):
* heap/GCThreadSharedData.h:
(JSC::GCThreadSharedData::getNextBlocksToCopy):
* heap/ListableHandler.h:
(JSC::ListableHandler::List::addThreadSafe):
(JSC::ListableHandler::List::addNotThreadSafe):
* heap/MachineStackMarker.cpp:
(JSC::MachineThreads::tryCopyOtherThreadStacks):
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::copyLater):
* parser/SourceProvider.cpp:
(JSC::SourceProvider::~SourceProvider):
(JSC::SourceProvider::getID):
* profiler/ProfilerDatabase.cpp:
(JSC::Profiler::Database::addDatabaseToAtExit):
(JSC::Profiler::Database::removeDatabaseFromAtExit):
(JSC::Profiler::Database::removeFirstAtExitDatabase):
* runtime/TypeProfilerLog.h:
Source/WebCore:
* bindings/objc/WebScriptObject.mm:
(WebCore::getJSWrapper):
(WebCore::addJSWrapper):
(WebCore::removeJSWrapper):
(WebCore::removeJSWrapperIfRetainCountOne):
* platform/audio/mac/CARingBuffer.cpp:
(WebCore::CARingBuffer::setCurrentFrameBounds):
(WebCore::CARingBuffer::getCurrentFrameBounds):
* platform/audio/mac/CARingBuffer.h:
* platform/ios/wak/WAKWindow.mm:
(-[WAKWindow setExposedScrollViewRect:]):
(-[WAKWindow exposedScrollViewRect]):
Source/WebKit2:
* WebProcess/WebPage/EventDispatcher.cpp:
(WebKit::EventDispatcher::clearQueuedTouchEventsForPage):
(WebKit::EventDispatcher::getQueuedTouchEventsForPage):
(WebKit::EventDispatcher::touchEvent):
(WebKit::EventDispatcher::dispatchTouchEvents):
* WebProcess/WebPage/EventDispatcher.h:
* WebProcess/WebPage/ViewUpdateDispatcher.cpp:
(WebKit::ViewUpdateDispatcher::visibleContentRectUpdate):
(WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate):
* WebProcess/WebPage/ViewUpdateDispatcher.h:
Source/WTF:
A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition
overhead is lower than system locks and because they take dramatically less space than system
locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock
acquire is up to 10x faster and under microcontention - short critical section with two or
more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4
bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit
should continue to avoid system locks - they are just far too slow and far too big.
But there is a problem with this idiom. System lock implementations will sleep a thread when
it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU.
In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for
microcontention, but awful when the lock will not be released for a while. In fact, when
critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is
almost 100x more than the CPU time cost of a system lock. This case doesn't arise too
frequently in our current uses of spinlocks, but that's probably because right now there are
places where we make a conscious decision to use system locks - even though they use more
memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a
while to acquire the lock.
The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new
concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks
that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The
idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held,
the slow path tries some number of spins to acquire the lock, and if that fails, the thread is
put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and
the lock itself is a tagged pointer: either it is just bits telling us the complete lock state
(not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the
lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path
if the lock is not contended, a short burst of spinning for microcontention, and a full-blown
queue for critical sections that are held for a long time.
On a locking microbenchmark, this new Lock exhibits the following performance
characteristics:
- Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster
than a system mutex.
- Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster
than a system mutex.
- CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a
SpinLock.
- Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than
a SpinLock.
This patch replaces all uses of SpinLock with Lock, since our critical sections are not
no-ops so if you do basically anything in your critical section, the Lock overhead will be
invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead
instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is
as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time.
This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits
of having a lock that just uses a byte are still better than the CPU wastage benefits of
Lock. But, this work will enable some future work to create locks that will fit in just 1.6
bits: https://bugs.webkit.org/show_bug.cgi?id=147665.
Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak
CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if
there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the
possibility of a null queue head.
[1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf
* WTF.vcxproj/WTF.vcxproj:
* WTF.xcodeproj/project.pbxproj:
* benchmarks: Added.
* benchmarks/LockSpeedTest.cpp: Added.
(main):
* wtf/Atomics.h:
(WTF::Atomic::compareExchangeWeak):
(WTF::Atomic::compareExchangeStrong):
* wtf/CMakeLists.txt:
* wtf/Lock.cpp: Added.
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
* wtf/Lock.h: Added.
(WTF::LockBase::lock):
(WTF::LockBase::unlock):
(WTF::LockBase::isHeld):
(WTF::LockBase::isLocked):
(WTF::Lock::Lock):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocatorHandle::shrink):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
(WTF::MetaAllocator::addFreshFreeSpace):
(WTF::MetaAllocator::debugFreeSpaceSize):
* wtf/MetaAllocator.h:
* wtf/SpinLock.h:
* wtf/ThreadingPthreads.cpp:
* wtf/ThreadingWin.cpp:
* wtf/text/AtomicString.cpp:
* wtf/text/AtomicStringImpl.cpp:
(WTF::AtomicStringTableLocker::AtomicStringTableLocker):
Tools:
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/Lock.cpp: Added.
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/165908@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
|
|
|
#include <wtf/Locker.h>
|
|
|
|
#include <wtf/Noncopyable.h>
|
2021-05-30 06:51:57 +00:00
|
|
|
#include <wtf/Seconds.h>
|
Make CheckedLock the default Lock
https://bugs.webkit.org/show_bug.cgi?id=226157
Reviewed by Darin Adler.
Make CheckedLock the default Lock so that we get more benefits from Clang
Thread Safety Analysis. Note that CheckedLock 100% relies on the existing
Source/JavaScriptCore:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* assembler/testmasm.cpp:
* dfg/DFGCommon.cpp:
* dfg/DFGThreadData.h:
* dfg/DFGWorklist.cpp:
(JSC::DFG::Worklist::Worklist):
* dfg/DFGWorklist.h:
* dynbench.cpp:
* heap/BlockDirectory.h:
(JSC::BlockDirectory::bitvectorLock):
* heap/CodeBlockSet.h:
(JSC::CodeBlockSet::getLock):
* heap/Heap.cpp:
(JSC::Heap::Heap):
* heap/Heap.h:
* heap/MarkedSpace.h:
(JSC::MarkedSpace::directoryLock):
* heap/MarkingConstraintSolver.h:
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::donateKnownParallel):
* heap/SlotVisitor.h:
* jit/ExecutableAllocator.cpp:
(JSC::ExecutableAllocator::getLock const):
(JSC::dumpJITMemory):
* jit/ExecutableAllocator.h:
(JSC::ExecutableAllocatorBase::getLock const):
* jit/JITWorklist.cpp:
(JSC::JITWorklist::JITWorklist):
* jit/JITWorklist.h:
* jsc.cpp:
* profiler/ProfilerDatabase.h:
* runtime/ConcurrentJSLock.h:
* runtime/DeferredWorkTimer.h:
* runtime/JSLock.h:
* runtime/SamplingProfiler.cpp:
(JSC::FrameWalker::FrameWalker):
(JSC::CFrameWalker::CFrameWalker):
(JSC::SamplingProfiler::takeSample):
* runtime/SamplingProfiler.h:
(JSC::SamplingProfiler::getLock):
* runtime/VM.h:
* runtime/VMTraps.cpp:
(JSC::VMTraps::invalidateCodeBlocksOnStack):
(JSC::VMTraps::VMTraps):
* runtime/VMTraps.h:
* tools/FunctionOverrides.h:
* tools/VMInspector.cpp:
(JSC::ensureIsSafeToLock):
* tools/VMInspector.h:
(JSC::VMInspector::getLock):
* wasm/WasmCalleeRegistry.h:
(JSC::Wasm::CalleeRegistry::getLock):
* wasm/WasmPlan.h:
* wasm/WasmStreamingCompiler.h:
* wasm/WasmThunks.h:
* wasm/WasmWorklist.cpp:
(JSC::Wasm::Worklist::Worklist):
* wasm/WasmWorklist.h:
Source/WebCore:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* Modules/indexeddb/server/IDBServer.cpp:
* Modules/webaudio/MediaElementAudioSourceNode.h:
* Modules/webdatabase/OriginLock.cpp:
* bindings/js/JSDOMGlobalObject.h:
* dom/Node.cpp:
* html/HTMLMediaElement.cpp:
(WebCore::HTMLMediaElement::createMediaPlayer):
* html/canvas/WebGLContextGroup.cpp:
(WebCore::WebGLContextGroup::objectGraphLockForAContext):
* html/canvas/WebGLContextGroup.h:
* html/canvas/WebGLContextObject.cpp:
(WebCore::WebGLContextObject::objectGraphLockForContext):
* html/canvas/WebGLContextObject.h:
* html/canvas/WebGLObject.h:
* html/canvas/WebGLRenderingContextBase.cpp:
(WebCore::WebGLRenderingContextBase::objectGraphLock):
* html/canvas/WebGLRenderingContextBase.h:
* html/canvas/WebGLSharedObject.cpp:
(WebCore::WebGLSharedObject::objectGraphLockForContext):
* html/canvas/WebGLSharedObject.h:
* page/scrolling/mac/ScrollingTreeMac.h:
* platform/audio/ReverbConvolver.cpp:
(WebCore::ReverbConvolver::backgroundThreadEntry):
* platform/graphics/ShadowBlur.cpp:
(WebCore::ScratchBuffer::lock):
(WebCore::ShadowBlur::drawRectShadowWithTiling):
(WebCore::ShadowBlur::drawInsetShadowWithTiling):
* platform/graphics/gstreamer/VideoSinkGStreamer.cpp:
* platform/graphics/gstreamer/eme/WebKitCommonEncryptionDecryptorGStreamer.cpp:
Source/WebKit:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* GPUProcess/graphics/RemoteGraphicsContextGL.cpp:
(WebKit::RemoteGraphicsContextGL::paintPixelBufferToImageBuffer):
* NetworkProcess/IndexedDB/WebIDBServer.cpp:
* UIProcess/API/glib/IconDatabase.h:
* UIProcess/mac/WKPrintingView.mm:
(-[WKPrintingView knowsPageRange:]):
Source/WTF:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* wtf/AutomaticThread.cpp:
(WTF::AutomaticThreadCondition::wait):
(WTF::AutomaticThreadCondition::waitFor):
(WTF::AutomaticThread::AutomaticThread):
* wtf/AutomaticThread.h:
* wtf/CheckedCondition.h:
* wtf/CheckedLock.h:
* wtf/Condition.h:
* wtf/Lock.cpp:
(WTF::UncheckedLock::lockSlow):
(WTF::UncheckedLock::unlockSlow):
(WTF::UncheckedLock::unlockFairlySlow):
(WTF::UncheckedLock::safepointSlow):
* wtf/Lock.h:
(WTF::WTF_ASSERTS_ACQUIRED_LOCK):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocator::MetaAllocator):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
* wtf/MetaAllocator.h:
* wtf/ParallelHelperPool.cpp:
(WTF::ParallelHelperPool::ParallelHelperPool):
* wtf/ParallelHelperPool.h:
* wtf/RecursiveLockAdapter.h:
* wtf/WorkerPool.cpp:
(WTF::WorkerPool::WorkerPool):
* wtf/WorkerPool.h:
Tools:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* TestWebKitAPI/Tests/WTF/CheckedConditionTest.cpp:
* TestWebKitAPI/Tests/WTF/Condition.cpp:
* TestWebKitAPI/Tests/WTF/MetaAllocator.cpp:
* WebKitTestRunner/InjectedBundle/AccessibilityController.cpp:
(WTR::AXThread::createThreadIfNeeded):
Canonical link: https://commits.webkit.org/238070@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277943 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-24 05:37:41 +00:00
|
|
|
#include <wtf/ThreadSafetyAnalysis.h>
|
Lightweight locks should be adaptive
https://bugs.webkit.org/show_bug.cgi?id=147545
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
* dfg/DFGCommon.cpp:
(JSC::DFG::startCrashing):
* heap/CopiedBlock.h:
(JSC::CopiedBlock::workListLock):
* heap/CopiedBlockInlines.h:
(JSC::CopiedBlock::shouldReportLiveBytes):
(JSC::CopiedBlock::reportLiveBytes):
* heap/CopiedSpace.cpp:
(JSC::CopiedSpace::doneFillingBlock):
* heap/CopiedSpace.h:
(JSC::CopiedSpace::CopiedGeneration::CopiedGeneration):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::recycleEvacuatedBlock):
* heap/GCThreadSharedData.cpp:
(JSC::GCThreadSharedData::didStartCopying):
* heap/GCThreadSharedData.h:
(JSC::GCThreadSharedData::getNextBlocksToCopy):
* heap/ListableHandler.h:
(JSC::ListableHandler::List::addThreadSafe):
(JSC::ListableHandler::List::addNotThreadSafe):
* heap/MachineStackMarker.cpp:
(JSC::MachineThreads::tryCopyOtherThreadStacks):
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::copyLater):
* parser/SourceProvider.cpp:
(JSC::SourceProvider::~SourceProvider):
(JSC::SourceProvider::getID):
* profiler/ProfilerDatabase.cpp:
(JSC::Profiler::Database::addDatabaseToAtExit):
(JSC::Profiler::Database::removeDatabaseFromAtExit):
(JSC::Profiler::Database::removeFirstAtExitDatabase):
* runtime/TypeProfilerLog.h:
Source/WebCore:
* bindings/objc/WebScriptObject.mm:
(WebCore::getJSWrapper):
(WebCore::addJSWrapper):
(WebCore::removeJSWrapper):
(WebCore::removeJSWrapperIfRetainCountOne):
* platform/audio/mac/CARingBuffer.cpp:
(WebCore::CARingBuffer::setCurrentFrameBounds):
(WebCore::CARingBuffer::getCurrentFrameBounds):
* platform/audio/mac/CARingBuffer.h:
* platform/ios/wak/WAKWindow.mm:
(-[WAKWindow setExposedScrollViewRect:]):
(-[WAKWindow exposedScrollViewRect]):
Source/WebKit2:
* WebProcess/WebPage/EventDispatcher.cpp:
(WebKit::EventDispatcher::clearQueuedTouchEventsForPage):
(WebKit::EventDispatcher::getQueuedTouchEventsForPage):
(WebKit::EventDispatcher::touchEvent):
(WebKit::EventDispatcher::dispatchTouchEvents):
* WebProcess/WebPage/EventDispatcher.h:
* WebProcess/WebPage/ViewUpdateDispatcher.cpp:
(WebKit::ViewUpdateDispatcher::visibleContentRectUpdate):
(WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate):
* WebProcess/WebPage/ViewUpdateDispatcher.h:
Source/WTF:
A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition
overhead is lower than system locks and because they take dramatically less space than system
locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock
acquire is up to 10x faster and under microcontention - short critical section with two or
more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4
bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit
should continue to avoid system locks - they are just far too slow and far too big.
But there is a problem with this idiom. System lock implementations will sleep a thread when
it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU.
In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for
microcontention, but awful when the lock will not be released for a while. In fact, when
critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is
almost 100x more than the CPU time cost of a system lock. This case doesn't arise too
frequently in our current uses of spinlocks, but that's probably because right now there are
places where we make a conscious decision to use system locks - even though they use more
memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a
while to acquire the lock.
The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new
concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks
that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The
idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held,
the slow path tries some number of spins to acquire the lock, and if that fails, the thread is
put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and
the lock itself is a tagged pointer: either it is just bits telling us the complete lock state
(not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the
lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path
if the lock is not contended, a short burst of spinning for microcontention, and a full-blown
queue for critical sections that are held for a long time.
On a locking microbenchmark, this new Lock exhibits the following performance
characteristics:
- Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster
than a system mutex.
- Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster
than a system mutex.
- CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a
SpinLock.
- Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than
a SpinLock.
This patch replaces all uses of SpinLock with Lock, since our critical sections are not
no-ops so if you do basically anything in your critical section, the Lock overhead will be
invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead
instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is
as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time.
This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits
of having a lock that just uses a byte are still better than the CPU wastage benefits of
Lock. But, this work will enable some future work to create locks that will fit in just 1.6
bits: https://bugs.webkit.org/show_bug.cgi?id=147665.
Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak
CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if
there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the
possibility of a null queue head.
[1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf
* WTF.vcxproj/WTF.vcxproj:
* WTF.xcodeproj/project.pbxproj:
* benchmarks: Added.
* benchmarks/LockSpeedTest.cpp: Added.
(main):
* wtf/Atomics.h:
(WTF::Atomic::compareExchangeWeak):
(WTF::Atomic::compareExchangeStrong):
* wtf/CMakeLists.txt:
* wtf/Lock.cpp: Added.
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
* wtf/Lock.h: Added.
(WTF::LockBase::lock):
(WTF::LockBase::unlock):
(WTF::LockBase::isHeld):
(WTF::LockBase::isLocked):
(WTF::Lock::Lock):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocatorHandle::shrink):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
(WTF::MetaAllocator::addFreshFreeSpace):
(WTF::MetaAllocator::debugFreeSpaceSize):
* wtf/MetaAllocator.h:
* wtf/SpinLock.h:
* wtf/ThreadingPthreads.cpp:
* wtf/ThreadingWin.cpp:
* wtf/text/AtomicString.cpp:
* wtf/text/AtomicStringImpl.cpp:
(WTF::AtomicStringTableLocker::AtomicStringTableLocker):
Tools:
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/Lock.cpp: Added.
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/165908@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
|
|
|
|
WTF::Lock should not suffer from the thundering herd
https://bugs.webkit.org/show_bug.cgi?id=147947
Reviewed by Geoffrey Garen.
Source/WTF:
This changes Lock::unlockSlow() to use unparkOne() instead of unparkAll(). The problem with
doing this is that it's not obvious after calling unparkOne() if there are any other threads
that are still parked on the lock's queue. If we assume that there are and leave the
hasParkedBit set, then future calls to unlock() will take the slow path. We don't want that
if there aren't actually any threads parked. On the other hand, if we assume that there
aren't any threads parked and clear the hasParkedBit, then if there actually were some
threads parked, then they may never be awoken since future calls to unlock() won't take slow
path and so won't call unparkOne(). In other words, we need a way to be very precise about
when we clear the hasParkedBit and we need to do it in a race-free way: it can't be the case
that we clear the bit just as some thread gets parked on the queue.
A similar problem arises in futexes, and one of the solutions is to have a thread that
acquires a lock after parking sets the hasParkedBit. This is what Rusty Russel's usersem
does. It's a subtle algorithm. Also, it means that if a thread barges in before the unparked
thread runs, then that barging thread will not know that there are threads parked. This
could increase the severity of barging.
Since ParkingLot is a user-level API, we don't have to worry about the kernel-user security
issues and so we can expose callbacks while ParkingLot is holding its internal locks. This
change does exactly that for unparkOne(). The new variant of unparkOne() will call a user
function while the queue from which we are unparking is locked. The callback is told basic
stats about the queue: did we unpark a thread this time, and could there be more threads to
unpark in the future. The callback runs while it's impossible for the queue state to change,
since the ParkingLot's internal locks for the queue is held. This means that
Lock::unlockSlow() can either clear, or leave, the hasParkedBit while releasing the lock
inside the callback from unparkOne(). This takes care of the thundering herd problem while
also reducing the greed that arises from barging threads.
This required some careful reworking of the ParkingLot algorithm. The first thing I noticed
was that the ThreadData::shouldPark flag was useless, since it's set exactly when
ThreadData::address is non-null. Then I had to make sure that dequeue() could lazily create
both hashtables and buckets, since the "callback is called while queue is locked" invariant
requires that we didn't exit early due to the hashtable or bucket not being present. Note
that all of this is done in such a way that the old unparkOne() and unparkAll() don't have
to create any buckets, though they now may create the hashtable. We don't care as much about
the hashtable being created by unpark since it's just such an unlikely scenario and it would
only happen once.
This change reduces the kernel CPU usage of WTF::Lock for the long critical section test by
about 8x and makes it always perform as well as WTF::WordLock and WTF::Mutex for that
benchmark.
* benchmarks/LockSpeedTest.cpp:
* wtf/Lock.cpp:
(WTF::LockBase::unlockSlow):
* wtf/Lock.h:
(WTF::LockBase::isLocked):
(WTF::LockBase::isFullyReset):
* wtf/ParkingLot.cpp:
(WTF::ParkingLot::parkConditionally):
(WTF::ParkingLot::unparkOne):
(WTF::ParkingLot::unparkAll):
* wtf/ParkingLot.h:
* wtf/WordLock.h:
(WTF::WordLock::isLocked):
(WTF::WordLock::isFullyReset):
Tools:
Add testing that checks that locks return to a pristine state after contention is over.
* TestWebKitAPI/Tests/WTF/Lock.cpp:
(TestWebKitAPI::LockInspector::isFullyReset):
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/166072@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-13 03:51:25 +00:00
|
|
|
namespace TestWebKitAPI {
|
|
|
|
struct LockInspector;
|
2018-12-09 07:09:09 +00:00
|
|
|
}
|
WTF::Lock should not suffer from the thundering herd
https://bugs.webkit.org/show_bug.cgi?id=147947
Reviewed by Geoffrey Garen.
Source/WTF:
This changes Lock::unlockSlow() to use unparkOne() instead of unparkAll(). The problem with
doing this is that it's not obvious after calling unparkOne() if there are any other threads
that are still parked on the lock's queue. If we assume that there are and leave the
hasParkedBit set, then future calls to unlock() will take the slow path. We don't want that
if there aren't actually any threads parked. On the other hand, if we assume that there
aren't any threads parked and clear the hasParkedBit, then if there actually were some
threads parked, then they may never be awoken since future calls to unlock() won't take slow
path and so won't call unparkOne(). In other words, we need a way to be very precise about
when we clear the hasParkedBit and we need to do it in a race-free way: it can't be the case
that we clear the bit just as some thread gets parked on the queue.
A similar problem arises in futexes, and one of the solutions is to have a thread that
acquires a lock after parking sets the hasParkedBit. This is what Rusty Russel's usersem
does. It's a subtle algorithm. Also, it means that if a thread barges in before the unparked
thread runs, then that barging thread will not know that there are threads parked. This
could increase the severity of barging.
Since ParkingLot is a user-level API, we don't have to worry about the kernel-user security
issues and so we can expose callbacks while ParkingLot is holding its internal locks. This
change does exactly that for unparkOne(). The new variant of unparkOne() will call a user
function while the queue from which we are unparking is locked. The callback is told basic
stats about the queue: did we unpark a thread this time, and could there be more threads to
unpark in the future. The callback runs while it's impossible for the queue state to change,
since the ParkingLot's internal locks for the queue is held. This means that
Lock::unlockSlow() can either clear, or leave, the hasParkedBit while releasing the lock
inside the callback from unparkOne(). This takes care of the thundering herd problem while
also reducing the greed that arises from barging threads.
This required some careful reworking of the ParkingLot algorithm. The first thing I noticed
was that the ThreadData::shouldPark flag was useless, since it's set exactly when
ThreadData::address is non-null. Then I had to make sure that dequeue() could lazily create
both hashtables and buckets, since the "callback is called while queue is locked" invariant
requires that we didn't exit early due to the hashtable or bucket not being present. Note
that all of this is done in such a way that the old unparkOne() and unparkAll() don't have
to create any buckets, though they now may create the hashtable. We don't care as much about
the hashtable being created by unpark since it's just such an unlikely scenario and it would
only happen once.
This change reduces the kernel CPU usage of WTF::Lock for the long critical section test by
about 8x and makes it always perform as well as WTF::WordLock and WTF::Mutex for that
benchmark.
* benchmarks/LockSpeedTest.cpp:
* wtf/Lock.cpp:
(WTF::LockBase::unlockSlow):
* wtf/Lock.h:
(WTF::LockBase::isLocked):
(WTF::LockBase::isFullyReset):
* wtf/ParkingLot.cpp:
(WTF::ParkingLot::parkConditionally):
(WTF::ParkingLot::unparkOne):
(WTF::ParkingLot::unparkAll):
* wtf/ParkingLot.h:
* wtf/WordLock.h:
(WTF::WordLock::isLocked):
(WTF::WordLock::isFullyReset):
Tools:
Add testing that checks that locks return to a pristine state after contention is over.
* TestWebKitAPI/Tests/WTF/Lock.cpp:
(TestWebKitAPI::LockInspector::isFullyReset):
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/166072@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-13 03:51:25 +00:00
|
|
|
|
Lightweight locks should be adaptive
https://bugs.webkit.org/show_bug.cgi?id=147545
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
* dfg/DFGCommon.cpp:
(JSC::DFG::startCrashing):
* heap/CopiedBlock.h:
(JSC::CopiedBlock::workListLock):
* heap/CopiedBlockInlines.h:
(JSC::CopiedBlock::shouldReportLiveBytes):
(JSC::CopiedBlock::reportLiveBytes):
* heap/CopiedSpace.cpp:
(JSC::CopiedSpace::doneFillingBlock):
* heap/CopiedSpace.h:
(JSC::CopiedSpace::CopiedGeneration::CopiedGeneration):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::recycleEvacuatedBlock):
* heap/GCThreadSharedData.cpp:
(JSC::GCThreadSharedData::didStartCopying):
* heap/GCThreadSharedData.h:
(JSC::GCThreadSharedData::getNextBlocksToCopy):
* heap/ListableHandler.h:
(JSC::ListableHandler::List::addThreadSafe):
(JSC::ListableHandler::List::addNotThreadSafe):
* heap/MachineStackMarker.cpp:
(JSC::MachineThreads::tryCopyOtherThreadStacks):
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::copyLater):
* parser/SourceProvider.cpp:
(JSC::SourceProvider::~SourceProvider):
(JSC::SourceProvider::getID):
* profiler/ProfilerDatabase.cpp:
(JSC::Profiler::Database::addDatabaseToAtExit):
(JSC::Profiler::Database::removeDatabaseFromAtExit):
(JSC::Profiler::Database::removeFirstAtExitDatabase):
* runtime/TypeProfilerLog.h:
Source/WebCore:
* bindings/objc/WebScriptObject.mm:
(WebCore::getJSWrapper):
(WebCore::addJSWrapper):
(WebCore::removeJSWrapper):
(WebCore::removeJSWrapperIfRetainCountOne):
* platform/audio/mac/CARingBuffer.cpp:
(WebCore::CARingBuffer::setCurrentFrameBounds):
(WebCore::CARingBuffer::getCurrentFrameBounds):
* platform/audio/mac/CARingBuffer.h:
* platform/ios/wak/WAKWindow.mm:
(-[WAKWindow setExposedScrollViewRect:]):
(-[WAKWindow exposedScrollViewRect]):
Source/WebKit2:
* WebProcess/WebPage/EventDispatcher.cpp:
(WebKit::EventDispatcher::clearQueuedTouchEventsForPage):
(WebKit::EventDispatcher::getQueuedTouchEventsForPage):
(WebKit::EventDispatcher::touchEvent):
(WebKit::EventDispatcher::dispatchTouchEvents):
* WebProcess/WebPage/EventDispatcher.h:
* WebProcess/WebPage/ViewUpdateDispatcher.cpp:
(WebKit::ViewUpdateDispatcher::visibleContentRectUpdate):
(WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate):
* WebProcess/WebPage/ViewUpdateDispatcher.h:
Source/WTF:
A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition
overhead is lower than system locks and because they take dramatically less space than system
locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock
acquire is up to 10x faster and under microcontention - short critical section with two or
more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4
bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit
should continue to avoid system locks - they are just far too slow and far too big.
But there is a problem with this idiom. System lock implementations will sleep a thread when
it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU.
In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for
microcontention, but awful when the lock will not be released for a while. In fact, when
critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is
almost 100x more than the CPU time cost of a system lock. This case doesn't arise too
frequently in our current uses of spinlocks, but that's probably because right now there are
places where we make a conscious decision to use system locks - even though they use more
memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a
while to acquire the lock.
The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new
concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks
that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The
idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held,
the slow path tries some number of spins to acquire the lock, and if that fails, the thread is
put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and
the lock itself is a tagged pointer: either it is just bits telling us the complete lock state
(not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the
lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path
if the lock is not contended, a short burst of spinning for microcontention, and a full-blown
queue for critical sections that are held for a long time.
On a locking microbenchmark, this new Lock exhibits the following performance
characteristics:
- Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster
than a system mutex.
- Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster
than a system mutex.
- CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a
SpinLock.
- Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than
a SpinLock.
This patch replaces all uses of SpinLock with Lock, since our critical sections are not
no-ops so if you do basically anything in your critical section, the Lock overhead will be
invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead
instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is
as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time.
This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits
of having a lock that just uses a byte are still better than the CPU wastage benefits of
Lock. But, this work will enable some future work to create locks that will fit in just 1.6
bits: https://bugs.webkit.org/show_bug.cgi?id=147665.
Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak
CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if
there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the
possibility of a null queue head.
[1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf
* WTF.vcxproj/WTF.vcxproj:
* WTF.xcodeproj/project.pbxproj:
* benchmarks: Added.
* benchmarks/LockSpeedTest.cpp: Added.
(main):
* wtf/Atomics.h:
(WTF::Atomic::compareExchangeWeak):
(WTF::Atomic::compareExchangeStrong):
* wtf/CMakeLists.txt:
* wtf/Lock.cpp: Added.
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
* wtf/Lock.h: Added.
(WTF::LockBase::lock):
(WTF::LockBase::unlock):
(WTF::LockBase::isHeld):
(WTF::LockBase::isLocked):
(WTF::Lock::Lock):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocatorHandle::shrink):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
(WTF::MetaAllocator::addFreshFreeSpace):
(WTF::MetaAllocator::debugFreeSpaceSize):
* wtf/MetaAllocator.h:
* wtf/SpinLock.h:
* wtf/ThreadingPthreads.cpp:
* wtf/ThreadingWin.cpp:
* wtf/text/AtomicString.cpp:
* wtf/text/AtomicStringImpl.cpp:
(WTF::AtomicStringTableLocker::AtomicStringTableLocker):
Tools:
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/Lock.cpp: Added.
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/165908@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
|
|
|
namespace WTF {
|
|
|
|
|
The GC should be optionally concurrent and disabled by default
https://bugs.webkit.org/show_bug.cgi?id=164454
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
This started out as a patch to have the GC scan the stack at the end, and then the
outage happened and I decided to pick a more aggresive target: give the GC a concurrent
mode that can be enabled at runtime, and whose only effect is that it turns on the
ResumeTheWorldScope. This gives our GC a really intuitive workflow: by default, the GC
thread is running solo with the world stopped and the parallel markers converged and
waiting. We have a parallel work scope to enable the parallel markers and now we have a
ResumeTheWorldScope that will optionally resume the world and then stop it again.
It's easy to make a concurrent GC that always instantly crashes. I can't promise that
this one won't do that when you run it. I set a specific goal: I wanted to do >10
concurrent GCs in debug mode with generations, optimizing JITs, and parallel marking
disabled.
To reach this milestone, I needed to do a bunch of stuff:
- The mutator needs a separate mark stack for the barrier, since it will mutate this
stack concurrently to the collector's slot visitors.
- The use of CellState to indicate whether an object is being scanned the first time or
a subsequent time was racy. It fails spectacularly when a barrier is fired at the same
time as visitChildren is running or if the barrier runs at the same time as the GC
marks the same object. So, I split SlotVisitor's mark stacks. It's now the case that
you know why you're being scanned by looking at which stack you came off of.
- All of root marking must be in the collector fixpoint. I renamed markRoots to
markToFixpoint. They say concurrency is hard, but the collector looks more intuitive
this way. We never gained anything from forcing people to make a choice between
scanning something in the fixpoint versus outside of it. Because root scanning is
cheap, we can afford to do it repeatedly, which means all root scanning can now do
constraint-based marking (like: I'll mark you if that thing is marked).
- JSObject::visitChildren's scanning of the butterfly raced with property additions,
indexed storage transitions and resizing, and a bunch of miscellaneous dirty butterfly
reshaping functions - like the one that flattens a dictionary and some sneaky
ArrayStorage transformations. Many of these can be fixed by using store-store fences
in the mutator and load-load fences in the collector. I've adopted the rule that the
collector must always see either a butterfly and structure that match or a newer
butterfly with an older structure, where their age is just one transition apart. This
can be achieved with fences. For the cases where it breaks down, I added a lock to
every JSCell. This is a full-fledged WTF lock that we sneak into two available bits in
the indexingType. See the WTF ChangeLog for details.
The mutator fencing rules are as follows:
- Store-store fence before and after setting the butterfly.
- Store-store fence before setting structure if you had changed the shape of the
butterfly.
- Store-store fence after initializing all fields in an allocation.
- A dictionary Structure can change in strange ways while the GC is trying to scan it.
So, JSObject::visitChildren will now grab the object's structure's lock if the
object's structure is a dictionary. Dictionary structures are 1:1 with their object,
so this does not reduce GC parallelism (super unlikely that the GC will simultaneously
scan an object from two threads).
- The GC can blow away a Structure's property table at any time. As a small consolation,
it's now holding the Structure's lock when it does so. But there was tons of code in
Structure that uses DeferGC to prevent the GC from blowing away the property table.
This doesn't work with concurrent GC, since DeferGC only means that the GC won't run
its safepoint (i.e. stop-the-world code) in the DeferGC region. It will still do
marking and it was the Structure::visitChildren that would delete the table. It turns
out that Structure's reliance on the property table not being deleted was the product
of code rot. We already had functions that would materialize the table on demand. We
were simply making the mistake of saying:
structure->materializePropertyMap();
...
structure->propertyTable()->things
Instead of saying:
PropertyTable* table = structure->ensurePropertyTable();
...
table->things
Switching the code to use the latter idiom allowed me to simplify the code a lot while
fixing the race.
- The LLInt's get_by_val handling was broken because the indexing shape constants were
wrong. Once I started putting more things into the IndexingType, that started causing
crashes for me. So I fixed LLInt. That turned out to be a lot of work, since that code
had rotted in subtle ways.
This is a speed-up in SunSpider, probably because of the LLInt fix. This is neutral on
Octane and Kraken. It's a smaller slow-down on LongSpider, but I think we can ignore
that (we don't view LongSpider as an official benchmark). By default, the concurrent GC
is disabled: in all of the places where it would have resumed the world to run marking
concurrently to the mutator, it will just skip the resume step. When you enable
concurrent GC (--useConcurrentGC=true), it can sometimes run Octane/splay to completion.
It seems to perform quite well: on my machine, it improves both splay-throughput and
splay-latency. It's probably unstable for other programs.
* API/JSVirtualMachine.mm:
(-[JSVirtualMachine isOldExternalObject:]):
* assembler/MacroAssemblerARMv7.h:
(JSC::MacroAssemblerARMv7::storeFence):
* bytecode/InlineAccess.cpp:
(JSC::InlineAccess::dumpCacheSizesAndCrash):
(JSC::InlineAccess::generateSelfPropertyAccess):
(JSC::InlineAccess::generateArrayLength):
* bytecode/ObjectAllocationProfile.h:
(JSC::ObjectAllocationProfile::offsetOfInlineCapacity):
(JSC::ObjectAllocationProfile::ObjectAllocationProfile):
(JSC::ObjectAllocationProfile::initialize):
(JSC::ObjectAllocationProfile::inlineCapacity):
(JSC::ObjectAllocationProfile::clear):
* bytecode/PolymorphicAccess.cpp:
(JSC::AccessCase::generateWithGuard):
(JSC::AccessCase::generateImpl):
* dfg/DFGArrayifySlowPathGenerator.h:
* dfg/DFGClobberize.h:
(JSC::DFG::clobberize):
* dfg/DFGOSRExitCompiler32_64.cpp:
(JSC::DFG::OSRExitCompiler::compileExit):
* dfg/DFGOSRExitCompiler64.cpp:
(JSC::DFG::OSRExitCompiler::compileExit):
* dfg/DFGOperations.cpp:
* dfg/DFGPlan.cpp:
(JSC::DFG::Plan::markCodeBlocks):
(JSC::DFG::Plan::rememberCodeBlocks):
* dfg/DFGPlan.h:
* dfg/DFGSpeculativeJIT.cpp:
(JSC::DFG::SpeculativeJIT::emitAllocateRawObject):
(JSC::DFG::SpeculativeJIT::checkArray):
(JSC::DFG::SpeculativeJIT::arrayify):
(JSC::DFG::SpeculativeJIT::compileMakeRope):
(JSC::DFG::SpeculativeJIT::compileNewFunctionCommon):
(JSC::DFG::SpeculativeJIT::compileCreateActivation):
(JSC::DFG::SpeculativeJIT::compileCreateDirectArguments):
(JSC::DFG::SpeculativeJIT::compileSpread):
(JSC::DFG::SpeculativeJIT::compileAllocatePropertyStorage):
(JSC::DFG::SpeculativeJIT::compileReallocatePropertyStorage):
(JSC::DFG::SpeculativeJIT::compileNewStringObject):
(JSC::DFG::SpeculativeJIT::compileNewTypedArray):
(JSC::DFG::SpeculativeJIT::compileStoreBarrier):
* dfg/DFGSpeculativeJIT64.cpp:
(JSC::DFG::SpeculativeJIT::compile):
(JSC::DFG::SpeculativeJIT::compileAllocateNewArrayWithSize):
* dfg/DFGTierUpCheckInjectionPhase.cpp:
(JSC::DFG::TierUpCheckInjectionPhase::run):
* dfg/DFGWorklist.cpp:
(JSC::DFG::Worklist::markCodeBlocks):
(JSC::DFG::Worklist::rememberCodeBlocks):
(JSC::DFG::markCodeBlocks):
(JSC::DFG::completeAllPlansForVM):
(JSC::DFG::rememberCodeBlocks):
* dfg/DFGWorklist.h:
* ftl/FTLAbstractHeapRepository.cpp:
(JSC::FTL::AbstractHeapRepository::AbstractHeapRepository):
(JSC::FTL::AbstractHeapRepository::computeRangesAndDecorateInstructions):
* ftl/FTLAbstractHeapRepository.h:
* ftl/FTLJITCode.cpp:
(JSC::FTL::JITCode::~JITCode):
* ftl/FTLLowerDFGToB3.cpp:
(JSC::FTL::DFG::LowerDFGToB3::compilePutStructure):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateActivation):
(JSC::FTL::DFG::LowerDFGToB3::compileNewFunction):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateDirectArguments):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateRest):
(JSC::FTL::DFG::LowerDFGToB3::compileNewObject):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArray):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSpread):
(JSC::FTL::DFG::LowerDFGToB3::compileSpread):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayBuffer):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSize):
(JSC::FTL::DFG::LowerDFGToB3::compileNewTypedArray):
(JSC::FTL::DFG::LowerDFGToB3::compileMakeRope):
(JSC::FTL::DFG::LowerDFGToB3::compileMultiPutByOffset):
(JSC::FTL::DFG::LowerDFGToB3::compileMaterializeNewObject):
(JSC::FTL::DFG::LowerDFGToB3::compileMaterializeCreateActivation):
(JSC::FTL::DFG::LowerDFGToB3::splatWords):
(JSC::FTL::DFG::LowerDFGToB3::allocatePropertyStorage):
(JSC::FTL::DFG::LowerDFGToB3::reallocatePropertyStorage):
(JSC::FTL::DFG::LowerDFGToB3::allocateObject):
(JSC::FTL::DFG::LowerDFGToB3::isArrayType):
(JSC::FTL::DFG::LowerDFGToB3::emitStoreBarrier):
(JSC::FTL::DFG::LowerDFGToB3::mutatorFence):
(JSC::FTL::DFG::LowerDFGToB3::setButterfly):
* ftl/FTLOSRExitCompiler.cpp:
(JSC::FTL::compileStub):
* ftl/FTLOutput.cpp:
(JSC::FTL::Output::signExt32ToPtr):
(JSC::FTL::Output::fence):
* ftl/FTLOutput.h:
* heap/CellState.h:
* heap/GCSegmentedArray.h:
* heap/Heap.cpp:
(JSC::Heap::ResumeTheWorldScope::ResumeTheWorldScope):
(JSC::Heap::ResumeTheWorldScope::~ResumeTheWorldScope):
(JSC::Heap::Heap):
(JSC::Heap::~Heap):
(JSC::Heap::harvestWeakReferences):
(JSC::Heap::finalizeUnconditionalFinalizers):
(JSC::Heap::completeAllJITPlans):
(JSC::Heap::markToFixpoint):
(JSC::Heap::gatherStackRoots):
(JSC::Heap::beginMarking):
(JSC::Heap::visitConservativeRoots):
(JSC::Heap::visitCompilerWorklistWeakReferences):
(JSC::Heap::updateObjectCounts):
(JSC::Heap::endMarking):
(JSC::Heap::addToRememberedSet):
(JSC::Heap::collectInThread):
(JSC::Heap::stopTheWorld):
(JSC::Heap::resumeTheWorld):
(JSC::Heap::setGCDidJIT):
(JSC::Heap::setNeedFinalize):
(JSC::Heap::setMutatorWaiting):
(JSC::Heap::clearMutatorWaiting):
(JSC::Heap::finalize):
(JSC::Heap::flushWriteBarrierBuffer):
(JSC::Heap::writeBarrierSlowPath):
(JSC::Heap::canCollect):
(JSC::Heap::reportExtraMemoryVisited):
(JSC::Heap::reportExternalMemoryVisited):
(JSC::Heap::notifyIsSafeToCollect):
(JSC::Heap::markRoots): Deleted.
(JSC::Heap::visitExternalRememberedSet): Deleted.
(JSC::Heap::visitSmallStrings): Deleted.
(JSC::Heap::visitProtectedObjects): Deleted.
(JSC::Heap::visitArgumentBuffers): Deleted.
(JSC::Heap::visitException): Deleted.
(JSC::Heap::visitStrongHandles): Deleted.
(JSC::Heap::visitHandleStack): Deleted.
(JSC::Heap::visitSamplingProfiler): Deleted.
(JSC::Heap::visitTypeProfiler): Deleted.
(JSC::Heap::visitShadowChicken): Deleted.
(JSC::Heap::traceCodeBlocksAndJITStubRoutines): Deleted.
(JSC::Heap::visitWeakHandles): Deleted.
(JSC::Heap::flushOldStructureIDTables): Deleted.
(JSC::Heap::stopAllocation): Deleted.
* heap/Heap.h:
(JSC::Heap::collectorSlotVisitor):
(JSC::Heap::mutatorMarkStack):
(JSC::Heap::mutatorShouldBeFenced):
(JSC::Heap::addressOfMutatorShouldBeFenced):
(JSC::Heap::slotVisitor): Deleted.
(JSC::Heap::notifyIsSafeToCollect): Deleted.
(JSC::Heap::barrierShouldBeFenced): Deleted.
(JSC::Heap::addressOfBarrierShouldBeFenced): Deleted.
* heap/MarkStack.cpp:
(JSC::MarkStackArray::transferTo):
* heap/MarkStack.h:
* heap/MarkedAllocator.cpp:
(JSC::MarkedAllocator::tryAllocateIn):
* heap/MarkedBlock.cpp:
(JSC::MarkedBlock::MarkedBlock):
(JSC::MarkedBlock::Handle::specializedSweep):
(JSC::MarkedBlock::Handle::sweep):
(JSC::MarkedBlock::Handle::sweepHelperSelectMarksMode):
(JSC::MarkedBlock::Handle::stopAllocating):
(JSC::MarkedBlock::Handle::resumeAllocating):
(JSC::MarkedBlock::aboutToMarkSlow):
(JSC::MarkedBlock::Handle::didConsumeFreeList):
(JSC::SetNewlyAllocatedFunctor::SetNewlyAllocatedFunctor): Deleted.
(JSC::SetNewlyAllocatedFunctor::operator()): Deleted.
* heap/MarkedBlock.h:
* heap/MarkedSpace.cpp:
(JSC::MarkedSpace::resumeAllocating):
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::SlotVisitor):
(JSC::SlotVisitor::~SlotVisitor):
(JSC::SlotVisitor::reset):
(JSC::SlotVisitor::clearMarkStacks):
(JSC::SlotVisitor::appendJSCellOrAuxiliary):
(JSC::SlotVisitor::setMarkedAndAppendToMarkStack):
(JSC::SlotVisitor::appendToMarkStack):
(JSC::SlotVisitor::appendToMutatorMarkStack):
(JSC::SlotVisitor::visitChildren):
(JSC::SlotVisitor::donateKnownParallel):
(JSC::SlotVisitor::drain):
(JSC::SlotVisitor::drainFromShared):
(JSC::SlotVisitor::containsOpaqueRoot):
(JSC::SlotVisitor::donateAndDrain):
(JSC::SlotVisitor::mergeOpaqueRoots):
(JSC::SlotVisitor::dump):
(JSC::SlotVisitor::clearMarkStack): Deleted.
(JSC::SlotVisitor::opaqueRootCount): Deleted.
* heap/SlotVisitor.h:
(JSC::SlotVisitor::collectorMarkStack):
(JSC::SlotVisitor::mutatorMarkStack):
(JSC::SlotVisitor::isEmpty):
(JSC::SlotVisitor::bytesVisited):
(JSC::SlotVisitor::markStack): Deleted.
(JSC::SlotVisitor::bytesCopied): Deleted.
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::reportExtraMemoryVisited):
(JSC::SlotVisitor::reportExternalMemoryVisited):
* jit/AssemblyHelpers.cpp:
(JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo):
* jit/AssemblyHelpers.h:
(JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo):
(JSC::AssemblyHelpers::barrierStoreLoadFence):
(JSC::AssemblyHelpers::mutatorFence):
(JSC::AssemblyHelpers::storeButterfly):
(JSC::AssemblyHelpers::jumpIfMutatorFenceNotNeeded):
(JSC::AssemblyHelpers::emitInitializeInlineStorage):
(JSC::AssemblyHelpers::emitInitializeOutOfLineStorage):
(JSC::AssemblyHelpers::jumpIfBarrierStoreLoadFenceNotNeeded): Deleted.
* jit/JITInlines.h:
(JSC::JIT::emitArrayProfilingSiteWithCell):
* jit/JITOperations.cpp:
* jit/JITPropertyAccess.cpp:
(JSC::JIT::emit_op_put_to_scope):
(JSC::JIT::emit_op_put_to_arguments):
* llint/LLIntData.cpp:
(JSC::LLInt::Data::performAssertions):
* llint/LowLevelInterpreter.asm:
* llint/LowLevelInterpreter64.asm:
* runtime/ButterflyInlines.h:
(JSC::Butterfly::create):
(JSC::Butterfly::createOrGrowPropertyStorage):
* runtime/ConcurrentJITLock.h:
(JSC::GCSafeConcurrentJITLocker::NoDefer::NoDefer): Deleted.
* runtime/GenericArgumentsInlines.h:
(JSC::GenericArguments<Type>::getOwnPropertySlotByIndex):
(JSC::GenericArguments<Type>::putByIndex):
* runtime/IndexingType.h:
* runtime/JSArray.cpp:
(JSC::JSArray::unshiftCountSlowCase):
(JSC::JSArray::unshiftCountWithArrayStorage):
* runtime/JSCell.h:
(JSC::JSCell::InternalLocker::InternalLocker):
(JSC::JSCell::InternalLocker::~InternalLocker):
(JSC::JSCell::atomicCompareExchangeCellStateWeakRelaxed):
(JSC::JSCell::atomicCompareExchangeCellStateStrong):
(JSC::JSCell::indexingTypeAndMiscOffset):
(JSC::JSCell::indexingTypeOffset): Deleted.
* runtime/JSCellInlines.h:
(JSC::JSCell::JSCell):
(JSC::JSCell::finishCreation):
(JSC::JSCell::indexingTypeAndMisc):
(JSC::JSCell::indexingType):
(JSC::JSCell::setStructure):
(JSC::JSCell::callDestructor):
(JSC::JSCell::lockInternalLock):
(JSC::JSCell::unlockInternalLock):
* runtime/JSObject.cpp:
(JSC::JSObject::visitButterfly):
(JSC::JSObject::visitChildren):
(JSC::JSFinalObject::visitChildren):
(JSC::JSObject::enterDictionaryIndexingModeWhenArrayStorageAlreadyExists):
(JSC::JSObject::createInitialUndecided):
(JSC::JSObject::createInitialInt32):
(JSC::JSObject::createInitialDouble):
(JSC::JSObject::createInitialContiguous):
(JSC::JSObject::createArrayStorage):
(JSC::JSObject::convertUndecidedToArrayStorage):
(JSC::JSObject::convertInt32ToArrayStorage):
(JSC::JSObject::convertDoubleToArrayStorage):
(JSC::JSObject::convertContiguousToArrayStorage):
(JSC::JSObject::deleteProperty):
(JSC::JSObject::defineOwnIndexedProperty):
(JSC::JSObject::increaseVectorLength):
(JSC::JSObject::ensureLengthSlow):
(JSC::JSObject::reallocateAndShrinkButterfly):
(JSC::JSObject::allocateMoreOutOfLineStorage):
(JSC::JSObject::shiftButterflyAfterFlattening):
(JSC::JSObject::growOutOfLineStorage): Deleted.
* runtime/JSObject.h:
(JSC::JSFinalObject::JSFinalObject):
(JSC::JSObject::setButterfly):
(JSC::JSObject::getOwnNonIndexPropertySlot):
(JSC::JSObject::fillCustomGetterPropertySlot):
(JSC::JSObject::getOwnPropertySlot):
(JSC::JSObject::getPropertySlot):
(JSC::JSObject::setStructureAndButterfly): Deleted.
(JSC::JSObject::setButterflyWithoutChangingStructure): Deleted.
(JSC::JSObject::putDirectInternal): Deleted.
(JSC::JSObject::putDirectWithoutTransition): Deleted.
* runtime/JSObjectInlines.h:
(JSC::JSObject::getPropertySlot):
(JSC::JSObject::getNonIndexPropertySlot):
(JSC::JSObject::putDirectWithoutTransition):
(JSC::JSObject::putDirectInternal):
* runtime/Options.h:
* runtime/SparseArrayValueMap.h:
* runtime/Structure.cpp:
(JSC::Structure::dumpStatistics):
(JSC::Structure::findStructuresAndMapForMaterialization):
(JSC::Structure::materializePropertyTable):
(JSC::Structure::addNewPropertyTransition):
(JSC::Structure::changePrototypeTransition):
(JSC::Structure::attributeChangeTransition):
(JSC::Structure::toDictionaryTransition):
(JSC::Structure::takePropertyTableOrCloneIfPinned):
(JSC::Structure::nonPropertyTransition):
(JSC::Structure::isSealed):
(JSC::Structure::isFrozen):
(JSC::Structure::flattenDictionaryStructure):
(JSC::Structure::pin):
(JSC::Structure::pinForCaching):
(JSC::Structure::willStoreValueSlow):
(JSC::Structure::copyPropertyTableForPinning):
(JSC::Structure::add):
(JSC::Structure::remove):
(JSC::Structure::getPropertyNamesFromStructure):
(JSC::Structure::visitChildren):
(JSC::Structure::materializePropertyMap): Deleted.
(JSC::Structure::addPropertyWithoutTransition): Deleted.
(JSC::Structure::removePropertyWithoutTransition): Deleted.
(JSC::Structure::copyPropertyTable): Deleted.
(JSC::Structure::createPropertyMap): Deleted.
(JSC::PropertyTable::checkConsistency): Deleted.
(JSC::Structure::checkConsistency): Deleted.
* runtime/Structure.h:
* runtime/StructureIDBlob.h:
(JSC::StructureIDBlob::StructureIDBlob):
(JSC::StructureIDBlob::indexingTypeIncludingHistory):
(JSC::StructureIDBlob::setIndexingTypeIncludingHistory):
(JSC::StructureIDBlob::indexingTypeIncludingHistoryOffset):
(JSC::StructureIDBlob::indexingType): Deleted.
(JSC::StructureIDBlob::setIndexingType): Deleted.
(JSC::StructureIDBlob::indexingTypeOffset): Deleted.
* runtime/StructureInlines.h:
(JSC::Structure::get):
(JSC::Structure::checkOffsetConsistency):
(JSC::Structure::checkConsistency):
(JSC::Structure::add):
(JSC::Structure::remove):
(JSC::Structure::addPropertyWithoutTransition):
(JSC::Structure::removePropertyWithoutTransition):
(JSC::Structure::setPropertyTable):
(JSC::Structure::putWillGrowOutOfLineStorage): Deleted.
(JSC::Structure::propertyTable): Deleted.
(JSC::Structure::suggestedNewOutOfLineStorageCapacity): Deleted.
Source/WTF:
The reason why I went to such great pains to make WTF::Lock fit in two bits is that I
knew that I would eventually need to stuff one into some miscellaneous bits of the
JSCell header. That time has come, because the concurrent GC has numerous race
conditions in visitChildren that can be trivially fixed if each object just has an
internal lock. Some cell types might use it to simply protect their entire visitChildren
function and anything that mutates the fields it touches, while other cell types might
use it as a "lock of last resort" to handle corner cases of an otherwise wait-free or
lock-free algorithm. Right now, it's used to protect certain transformations involving
indexing storage.
To make this happen, I factored the WTF::Lock algorithm into a LockAlgorithm struct that
is templatized on lock type (uint8_t for WTF::Lock), the isHeldBit value (1 for
WTF::Lock), and the hasParkedBit value (2 for WTF::Lock). This could have been done as
a templatized Lock class that basically contains Atomic<LockType>. You could then make
any field into a lock by bitwise_casting it to TemplateLock<field type, bit1, bit2>. But
this felt too dirty, so instead, LockAlgorithm has static methods that take
Atomic<LockType>& as their first argument. I think that this makes it more natural to
project a LockAlgorithm onto an existing Atomic<> field. Sadly, some places have to cast
their non-Atomic<> field to Atomic<> in order for this to work. Like so many other things
we do, this just shows that the C++ style of labeling fields that are subject to atomic
ops as atomic is counterproductive. Maybe some day I'll change LockAlgorithm to use our
other Atomics API, which does not require Atomic<>.
WTF::Lock now uses LockAlgorithm. The slow paths are still outlined. I don't feel too
bad about the LockAlgorithm.h header being included in so many places because we change
that algorithm so infrequently.
Also, I added a hasElapsed(time) function. This function makes it so much more natural
to write timeslicing code, which the concurrent GC has to do a lot of.
* WTF.xcodeproj/project.pbxproj:
* wtf/CMakeLists.txt:
* wtf/ListDump.h:
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
(WTF::LockBase::unlockFairlySlow):
(WTF::LockBase::unlockSlowImpl): Deleted.
* wtf/Lock.h:
(WTF::LockBase::lock):
(WTF::LockBase::tryLock):
(WTF::LockBase::unlock):
(WTF::LockBase::unlockFairly):
(WTF::LockBase::isHeld):
(): Deleted.
* wtf/LockAlgorithm.h: Added.
(WTF::LockAlgorithm::lockFastAssumingZero):
(WTF::LockAlgorithm::lockFast):
(WTF::LockAlgorithm::lock):
(WTF::LockAlgorithm::tryLock):
(WTF::LockAlgorithm::unlockFastAssumingZero):
(WTF::LockAlgorithm::unlockFast):
(WTF::LockAlgorithm::unlock):
(WTF::LockAlgorithm::unlockFairly):
(WTF::LockAlgorithm::isLocked):
(WTF::LockAlgorithm::lockSlow):
(WTF::LockAlgorithm::unlockSlow):
* wtf/TimeWithDynamicClockType.cpp:
(WTF::hasElapsed):
* wtf/TimeWithDynamicClockType.h:
Canonical link: https://commits.webkit.org/182434@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@208720 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-11-15 01:49:22 +00:00
|
|
|
typedef LockAlgorithm<uint8_t, 1, 2> DefaultLockAlgorithm;
|
|
|
|
|
Always use a byte-sized lock implementation
https://bugs.webkit.org/show_bug.cgi?id=147908
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
* runtime/ConcurrentJITLock.h: Lock is now byte-sized and ByteLock is gone, so use Lock.
Source/WTF:
At the start of my locking algorithm crusade, I implemented Lock, which is a sizeof(void*)
lock implementation with some nice theoretical properties and good performance. Then I added
the ParkingLot abstraction and ByteLock. ParkingLot uses Lock in its implementation.
ByteLock uses ParkingLot to create a sizeof(char) lock implementation that performs like
Lock.
It turns out that ByteLock is always at least as good as Lock, and sometimes a lot better:
it requires 8x less memory on 64-bit systems. It's hard to construct a benchmark where
ByteLock is significantly slower than Lock, and when you do construct such a benchmark,
tweaking it a bit can also create a scenario where ByteLock is significantly faster than
Lock.
So, the thing that we call "Lock" should really use ByteLock's algorithm, since it is more
compact and just as fast. That's what this patch does.
But we still need to keep the old Lock algorithm, because it's used to implement ParkingLot,
which in turn is used to implement ByteLock. So this patch does this transformation:
- Move the algorithm in Lock into files called WordLock.h|cpp. Make ParkingLot use
WordLock.
- Move the algorithm in ByteLock into Lock.h|cpp. Make everyone who used ByteLock use Lock
instead. All other users of Lock now get the byte-sized lock implementation.
- Remove the old ByteLock files.
* WTF.vcxproj/WTF.vcxproj:
* WTF.xcodeproj/project.pbxproj:
* benchmarks/LockSpeedTest.cpp:
(main):
* wtf/WordLock.cpp: Added.
(WTF::WordLock::lockSlow):
(WTF::WordLock::unlockSlow):
* wtf/WordLock.h: Added.
(WTF::WordLock::WordLock):
(WTF::WordLock::lock):
(WTF::WordLock::unlock):
(WTF::WordLock::isHeld):
(WTF::WordLock::isLocked):
* wtf/ByteLock.cpp: Removed.
* wtf/ByteLock.h: Removed.
* wtf/CMakeLists.txt:
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
* wtf/Lock.h:
(WTF::LockBase::lock):
(WTF::LockBase::unlock):
(WTF::LockBase::isHeld):
(WTF::LockBase::isLocked):
(WTF::Lock::Lock):
* wtf/ParkingLot.cpp:
Tools:
All previous tests of Lock are now tests of WordLock. All previous tests of ByteLock are
now tests of Lock.
* TestWebKitAPI/Tests/WTF/Lock.cpp:
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/166025@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188323 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-12 04:20:24 +00:00
|
|
|
// This is a fully adaptive mutex that only requires 1 byte of storage. It has fast paths that are
|
2015-08-20 02:34:02 +00:00
|
|
|
// competetive to a spinlock (uncontended locking is inlined and is just a CAS, microcontention is
|
|
|
|
// handled by spinning and yielding), and a slow path that is competetive to std::mutex (if a lock
|
WTF::Lock should be fair eventually
https://bugs.webkit.org/show_bug.cgi?id=159384
Reviewed by Geoffrey Garen.
Source/WTF:
In https://webkit.org/blog/6161/locking-in-webkit/ we showed how relaxing the fairness of
locks makes them fast. That post presented lock fairness as a trade-off between two
extremes:
- Barging. A barging lock, like WTF::Lock, releases the lock in unlock() even if there was a
thread on the queue. If there was a thread on the queue, the lock is released and that
thread is made runnable. That thread may then grab the lock, or some other thread may grab
the lock first (it may barge). Usually, the barging thread is the thread that released the
lock in the first place. This maximizes throughput but hurts fairness. There is no good
theoretical bound on how unfair the lock may become, but empirical data suggests that it's
fair enough for the cases we previously measured.
- FIFO. A FIFO lock, like HandoffLock in ToyLocks.h, does not release the lock in unlock()
if there is a thread waiting. If there is a thread waiting, unlock() will make that thread
runnable and inform it that it now holds the lock. This ensures perfect round-robin
fairness and allows us to reason theoretically about how long it may take for a thread to
grab the lock. For example, if we know that only N threads are running and each one may
contend on a critical section, and each one may hold the lock for at most S seconds, then
the time it takes to grab the lock is N * S. Unfortunately, FIFO locks perform very badly
in most cases. This is because for the common case of short critical sections, they force
a context switch after each critical section if the lock is contended.
This change makes WTF::Lock almost as fair as FIFO while still being as fast as barging.
Thanks to this new algorithm, you can now have both of these things at the same time.
This change makes WTF::Lock eventually fair. We can almost (more on the caveats below)
guarantee that the time it takes to grab a lock is N * max(1ms, S). In other words, critical
sections that are longer than 1ms are always fair. For shorter critical sections, the amount
of time that any thread waits is 1ms times the number of threads. There are some caveats
that arise from our use of randomness, but even then, in the limit as the critical section
length goes to infinity, the lock becomes fair. The corner cases are unlikely to happen; our
experiments show that the lock becomes exactly as fair as a FIFO lock for any critical
section that is 1ms or longer.
The fairness mechanism is broken into two parts. WTF::Lock can now choose to unlock a lock
fairly or unfairly thanks to the new ParkingLot token mechanism. WTF::Lock knows when to use
fair unlocking based on a timeout mechanism in ParkingLot called timeToBeFair.
ParkingLot::unparkOne() and ParkingLot::parkConditionally() can now communicate with each
other via a token. unparkOne() can pass a token, which parkConditionally() will return. This
change also makes parkConditionally() a lot more precise about when it was unparked due to a
call to unparkOne(). If unparkOne() is told that a thread was unparked then this thread is
guaranteed to report that it was unparked rather than timing out, and that thread is
guaranteed to get the token that unparkOne() passed. The token is an intptr_t. We use it as
a boolean variable in WTF::Lock, but you could use it to pass arbitrary data structures. By
default, the token is zero. WTF::Lock's unlock() will pass 1 as the token if it is doing
fair unlocking. In that case, unlock() will not release the lock, and lock() will know that
it holds the lock as soon as parkConditionally() returns. Note that this algorithm relies
on unparkOne() invoking WTF::Lock's callback while the queue lock is held, so that WTF::Lock
can make a decision about unlock strategy and inject a token while it has complete knowledge
over the state of the queue. As such, it's not immediately obvious how to implement this
algorithm on top of futexes. You really need ParkingLot!
WTF::Lock does not use fair unlocking every time. We expose a new API, Lock::unlockFairly(),
which forces the fair unlocking behavior. Additionally, ParkingLot now maintains a
per-bucket stochastic fairness timeout. When the timeout fires, the unparkOne() callback
sees UnparkResult::timeToBeFair = true. This timeout is set to be anywhere from 0ms to 1ms
at random. When a dequeue happens and there are threads that actually get dequeued, we check
if the time since the last unfair unlock (the last time timeToBeFair was set to true) is
more than the timeout amount. If so, then we set timeToBeFair to true and reset the timeout.
This means that in the absence of ParkingLot collisions, unfair unlocking is guaranteed to
happen at least once per millisecond. It will happen at 2 KHz on average. If there are
collisions, then each collision adds one millisecond to the worst case (and 0.5 ms to the
average case). The reason why we don't just use a fixed 1ms timeout is that we want to avoid
resonance. Imagine a program in which some thread acquires a lock at 1 KHz in-phase with the
timeToBeFair timeout. Then this thread would be the benefactor of fairness to the detriment
of everyone else. Randomness ensures that we aren't too fair to any one thread.
Empirically, this is neutral on our major benchmarks like JetStream but it's an enormous
improvement in LockFairnessTest. It's common for an unfair lock (either our BargingLock, the
old WTF::Lock, any of the other futex-based locks that barge, or new os_unfair_lock) to
allow only one thread to hold the lock during a whole second in which each thread is holding
the lock for 1ms at a time. This is because in a barging lock, releasing a lock after
holding it for 1ms and then reacquiring it immediately virtually ensures that none of the
other threads can wake up in time to grab it before it's relocked. But the new WTF::Lock
handles this case like a champ: each thread gets equal turns.
Here's some data. If we launch 10 threads and have each of them run for 1 second while
repeatedly holding a critical section for 1ms, then here's how many times each thread gets
to hold the lock using the old WTF::Lock algorithm:
799, 6, 1, 1, 1, 1, 1, 1, 1, 1
One thread hogged the lock for almost the whole time! With the new WTF::Lock, the lock
becomes totally fair:
80, 79, 79, 79, 79, 79, 79, 80, 80, 79
I don't know of anyone creating such an automatically-fair adaptive lock before, so I think
that this is a pretty awesome advancement to the state of the art!
This change is good for three reasons:
- We do have long critical sections in WebKit and we don't want to have to worry about
starvation. This reduces the likelihood that we will see starvation due to our lock
strategy.
- I was talking to ggaren about bmalloc's locking needs, and he wanted unlockFairly() or
lockFairly() or some moral equivalent for the scavenger thread.
- If we use a WTF::Lock to manage heap access in a multithreaded GC, we'll need the ability
to unlock and relock without barging.
* benchmarks/LockFairnessTest.cpp:
(main):
* benchmarks/ToyLocks.h:
* wtf/Condition.h:
(WTF::ConditionBase::waitUntil):
(WTF::ConditionBase::notifyOne):
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
(WTF::LockBase::unlockFairlySlow):
(WTF::LockBase::unlockSlowImpl):
* wtf/Lock.h:
(WTF::LockBase::try_lock):
(WTF::LockBase::unlock):
(WTF::LockBase::unlockFairly):
(WTF::LockBase::isHeld):
(WTF::LockBase::isFullyReset):
* wtf/ParkingLot.cpp:
(WTF::ParkingLot::parkConditionallyImpl):
(WTF::ParkingLot::unparkOne):
(WTF::ParkingLot::unparkOneImpl):
(WTF::ParkingLot::unparkAll):
* wtf/ParkingLot.h:
(WTF::ParkingLot::parkConditionally):
(WTF::ParkingLot::compareAndPark):
(WTF::ParkingLot::unparkOne):
Tools:
* TestWebKitAPI/Tests/WTF/ParkingLot.cpp:
Canonical link: https://commits.webkit.org/178039@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@203350 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-07-18 18:32:52 +00:00
|
|
|
// cannot be acquired in a short period of time, the thread is put to sleep until the lock is
|
|
|
|
// available again). It uses less memory than a std::mutex. This lock guarantees eventual stochastic
|
|
|
|
// fairness, even in programs that relock the lock immediately after unlocking it. Except when there
|
|
|
|
// are collisions between this lock and other locks in the ParkingLot, this lock will guarantee that
|
|
|
|
// at worst one call to unlock() per millisecond will do a direct hand-off to the thread that is at
|
|
|
|
// the head of the queue. When there are collisions, each collision increases the fair unlock delay
|
|
|
|
// by one millisecond in the worst case.
|
2021-05-30 20:35:59 +00:00
|
|
|
//
|
|
|
|
// This lock type supports thread safety analysis.
|
|
|
|
// To annotate a member variable or a global variable with thread ownership information,
|
|
|
|
// use lock capability annotations defined in ThreadSafetyAnalysis.h.
|
|
|
|
class WTF_CAPABILITY_LOCK Lock {
|
|
|
|
WTF_MAKE_NONCOPYABLE(Lock);
|
2017-12-07 03:52:09 +00:00
|
|
|
WTF_MAKE_FAST_ALLOCATED;
|
|
|
|
public:
|
2021-05-30 20:35:59 +00:00
|
|
|
constexpr Lock() = default;
|
2015-08-11 19:51:35 +00:00
|
|
|
|
2021-05-30 20:35:59 +00:00
|
|
|
void lock() WTF_ACQUIRES_LOCK()
|
Lightweight locks should be adaptive
https://bugs.webkit.org/show_bug.cgi?id=147545
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
* dfg/DFGCommon.cpp:
(JSC::DFG::startCrashing):
* heap/CopiedBlock.h:
(JSC::CopiedBlock::workListLock):
* heap/CopiedBlockInlines.h:
(JSC::CopiedBlock::shouldReportLiveBytes):
(JSC::CopiedBlock::reportLiveBytes):
* heap/CopiedSpace.cpp:
(JSC::CopiedSpace::doneFillingBlock):
* heap/CopiedSpace.h:
(JSC::CopiedSpace::CopiedGeneration::CopiedGeneration):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::recycleEvacuatedBlock):
* heap/GCThreadSharedData.cpp:
(JSC::GCThreadSharedData::didStartCopying):
* heap/GCThreadSharedData.h:
(JSC::GCThreadSharedData::getNextBlocksToCopy):
* heap/ListableHandler.h:
(JSC::ListableHandler::List::addThreadSafe):
(JSC::ListableHandler::List::addNotThreadSafe):
* heap/MachineStackMarker.cpp:
(JSC::MachineThreads::tryCopyOtherThreadStacks):
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::copyLater):
* parser/SourceProvider.cpp:
(JSC::SourceProvider::~SourceProvider):
(JSC::SourceProvider::getID):
* profiler/ProfilerDatabase.cpp:
(JSC::Profiler::Database::addDatabaseToAtExit):
(JSC::Profiler::Database::removeDatabaseFromAtExit):
(JSC::Profiler::Database::removeFirstAtExitDatabase):
* runtime/TypeProfilerLog.h:
Source/WebCore:
* bindings/objc/WebScriptObject.mm:
(WebCore::getJSWrapper):
(WebCore::addJSWrapper):
(WebCore::removeJSWrapper):
(WebCore::removeJSWrapperIfRetainCountOne):
* platform/audio/mac/CARingBuffer.cpp:
(WebCore::CARingBuffer::setCurrentFrameBounds):
(WebCore::CARingBuffer::getCurrentFrameBounds):
* platform/audio/mac/CARingBuffer.h:
* platform/ios/wak/WAKWindow.mm:
(-[WAKWindow setExposedScrollViewRect:]):
(-[WAKWindow exposedScrollViewRect]):
Source/WebKit2:
* WebProcess/WebPage/EventDispatcher.cpp:
(WebKit::EventDispatcher::clearQueuedTouchEventsForPage):
(WebKit::EventDispatcher::getQueuedTouchEventsForPage):
(WebKit::EventDispatcher::touchEvent):
(WebKit::EventDispatcher::dispatchTouchEvents):
* WebProcess/WebPage/EventDispatcher.h:
* WebProcess/WebPage/ViewUpdateDispatcher.cpp:
(WebKit::ViewUpdateDispatcher::visibleContentRectUpdate):
(WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate):
* WebProcess/WebPage/ViewUpdateDispatcher.h:
Source/WTF:
A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition
overhead is lower than system locks and because they take dramatically less space than system
locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock
acquire is up to 10x faster and under microcontention - short critical section with two or
more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4
bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit
should continue to avoid system locks - they are just far too slow and far too big.
But there is a problem with this idiom. System lock implementations will sleep a thread when
it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU.
In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for
microcontention, but awful when the lock will not be released for a while. In fact, when
critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is
almost 100x more than the CPU time cost of a system lock. This case doesn't arise too
frequently in our current uses of spinlocks, but that's probably because right now there are
places where we make a conscious decision to use system locks - even though they use more
memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a
while to acquire the lock.
The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new
concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks
that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The
idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held,
the slow path tries some number of spins to acquire the lock, and if that fails, the thread is
put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and
the lock itself is a tagged pointer: either it is just bits telling us the complete lock state
(not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the
lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path
if the lock is not contended, a short burst of spinning for microcontention, and a full-blown
queue for critical sections that are held for a long time.
On a locking microbenchmark, this new Lock exhibits the following performance
characteristics:
- Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster
than a system mutex.
- Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster
than a system mutex.
- CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a
SpinLock.
- Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than
a SpinLock.
This patch replaces all uses of SpinLock with Lock, since our critical sections are not
no-ops so if you do basically anything in your critical section, the Lock overhead will be
invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead
instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is
as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time.
This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits
of having a lock that just uses a byte are still better than the CPU wastage benefits of
Lock. But, this work will enable some future work to create locks that will fit in just 1.6
bits: https://bugs.webkit.org/show_bug.cgi?id=147665.
Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak
CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if
there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the
possibility of a null queue head.
[1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf
* WTF.vcxproj/WTF.vcxproj:
* WTF.xcodeproj/project.pbxproj:
* benchmarks: Added.
* benchmarks/LockSpeedTest.cpp: Added.
(main):
* wtf/Atomics.h:
(WTF::Atomic::compareExchangeWeak):
(WTF::Atomic::compareExchangeStrong):
* wtf/CMakeLists.txt:
* wtf/Lock.cpp: Added.
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
* wtf/Lock.h: Added.
(WTF::LockBase::lock):
(WTF::LockBase::unlock):
(WTF::LockBase::isHeld):
(WTF::LockBase::isLocked):
(WTF::Lock::Lock):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocatorHandle::shrink):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
(WTF::MetaAllocator::addFreshFreeSpace):
(WTF::MetaAllocator::debugFreeSpaceSize):
* wtf/MetaAllocator.h:
* wtf/SpinLock.h:
* wtf/ThreadingPthreads.cpp:
* wtf/ThreadingWin.cpp:
* wtf/text/AtomicString.cpp:
* wtf/text/AtomicStringImpl.cpp:
(WTF::AtomicStringTableLocker::AtomicStringTableLocker):
Tools:
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/Lock.cpp: Added.
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/165908@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
|
|
|
{
|
The GC should be optionally concurrent and disabled by default
https://bugs.webkit.org/show_bug.cgi?id=164454
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
This started out as a patch to have the GC scan the stack at the end, and then the
outage happened and I decided to pick a more aggresive target: give the GC a concurrent
mode that can be enabled at runtime, and whose only effect is that it turns on the
ResumeTheWorldScope. This gives our GC a really intuitive workflow: by default, the GC
thread is running solo with the world stopped and the parallel markers converged and
waiting. We have a parallel work scope to enable the parallel markers and now we have a
ResumeTheWorldScope that will optionally resume the world and then stop it again.
It's easy to make a concurrent GC that always instantly crashes. I can't promise that
this one won't do that when you run it. I set a specific goal: I wanted to do >10
concurrent GCs in debug mode with generations, optimizing JITs, and parallel marking
disabled.
To reach this milestone, I needed to do a bunch of stuff:
- The mutator needs a separate mark stack for the barrier, since it will mutate this
stack concurrently to the collector's slot visitors.
- The use of CellState to indicate whether an object is being scanned the first time or
a subsequent time was racy. It fails spectacularly when a barrier is fired at the same
time as visitChildren is running or if the barrier runs at the same time as the GC
marks the same object. So, I split SlotVisitor's mark stacks. It's now the case that
you know why you're being scanned by looking at which stack you came off of.
- All of root marking must be in the collector fixpoint. I renamed markRoots to
markToFixpoint. They say concurrency is hard, but the collector looks more intuitive
this way. We never gained anything from forcing people to make a choice between
scanning something in the fixpoint versus outside of it. Because root scanning is
cheap, we can afford to do it repeatedly, which means all root scanning can now do
constraint-based marking (like: I'll mark you if that thing is marked).
- JSObject::visitChildren's scanning of the butterfly raced with property additions,
indexed storage transitions and resizing, and a bunch of miscellaneous dirty butterfly
reshaping functions - like the one that flattens a dictionary and some sneaky
ArrayStorage transformations. Many of these can be fixed by using store-store fences
in the mutator and load-load fences in the collector. I've adopted the rule that the
collector must always see either a butterfly and structure that match or a newer
butterfly with an older structure, where their age is just one transition apart. This
can be achieved with fences. For the cases where it breaks down, I added a lock to
every JSCell. This is a full-fledged WTF lock that we sneak into two available bits in
the indexingType. See the WTF ChangeLog for details.
The mutator fencing rules are as follows:
- Store-store fence before and after setting the butterfly.
- Store-store fence before setting structure if you had changed the shape of the
butterfly.
- Store-store fence after initializing all fields in an allocation.
- A dictionary Structure can change in strange ways while the GC is trying to scan it.
So, JSObject::visitChildren will now grab the object's structure's lock if the
object's structure is a dictionary. Dictionary structures are 1:1 with their object,
so this does not reduce GC parallelism (super unlikely that the GC will simultaneously
scan an object from two threads).
- The GC can blow away a Structure's property table at any time. As a small consolation,
it's now holding the Structure's lock when it does so. But there was tons of code in
Structure that uses DeferGC to prevent the GC from blowing away the property table.
This doesn't work with concurrent GC, since DeferGC only means that the GC won't run
its safepoint (i.e. stop-the-world code) in the DeferGC region. It will still do
marking and it was the Structure::visitChildren that would delete the table. It turns
out that Structure's reliance on the property table not being deleted was the product
of code rot. We already had functions that would materialize the table on demand. We
were simply making the mistake of saying:
structure->materializePropertyMap();
...
structure->propertyTable()->things
Instead of saying:
PropertyTable* table = structure->ensurePropertyTable();
...
table->things
Switching the code to use the latter idiom allowed me to simplify the code a lot while
fixing the race.
- The LLInt's get_by_val handling was broken because the indexing shape constants were
wrong. Once I started putting more things into the IndexingType, that started causing
crashes for me. So I fixed LLInt. That turned out to be a lot of work, since that code
had rotted in subtle ways.
This is a speed-up in SunSpider, probably because of the LLInt fix. This is neutral on
Octane and Kraken. It's a smaller slow-down on LongSpider, but I think we can ignore
that (we don't view LongSpider as an official benchmark). By default, the concurrent GC
is disabled: in all of the places where it would have resumed the world to run marking
concurrently to the mutator, it will just skip the resume step. When you enable
concurrent GC (--useConcurrentGC=true), it can sometimes run Octane/splay to completion.
It seems to perform quite well: on my machine, it improves both splay-throughput and
splay-latency. It's probably unstable for other programs.
* API/JSVirtualMachine.mm:
(-[JSVirtualMachine isOldExternalObject:]):
* assembler/MacroAssemblerARMv7.h:
(JSC::MacroAssemblerARMv7::storeFence):
* bytecode/InlineAccess.cpp:
(JSC::InlineAccess::dumpCacheSizesAndCrash):
(JSC::InlineAccess::generateSelfPropertyAccess):
(JSC::InlineAccess::generateArrayLength):
* bytecode/ObjectAllocationProfile.h:
(JSC::ObjectAllocationProfile::offsetOfInlineCapacity):
(JSC::ObjectAllocationProfile::ObjectAllocationProfile):
(JSC::ObjectAllocationProfile::initialize):
(JSC::ObjectAllocationProfile::inlineCapacity):
(JSC::ObjectAllocationProfile::clear):
* bytecode/PolymorphicAccess.cpp:
(JSC::AccessCase::generateWithGuard):
(JSC::AccessCase::generateImpl):
* dfg/DFGArrayifySlowPathGenerator.h:
* dfg/DFGClobberize.h:
(JSC::DFG::clobberize):
* dfg/DFGOSRExitCompiler32_64.cpp:
(JSC::DFG::OSRExitCompiler::compileExit):
* dfg/DFGOSRExitCompiler64.cpp:
(JSC::DFG::OSRExitCompiler::compileExit):
* dfg/DFGOperations.cpp:
* dfg/DFGPlan.cpp:
(JSC::DFG::Plan::markCodeBlocks):
(JSC::DFG::Plan::rememberCodeBlocks):
* dfg/DFGPlan.h:
* dfg/DFGSpeculativeJIT.cpp:
(JSC::DFG::SpeculativeJIT::emitAllocateRawObject):
(JSC::DFG::SpeculativeJIT::checkArray):
(JSC::DFG::SpeculativeJIT::arrayify):
(JSC::DFG::SpeculativeJIT::compileMakeRope):
(JSC::DFG::SpeculativeJIT::compileNewFunctionCommon):
(JSC::DFG::SpeculativeJIT::compileCreateActivation):
(JSC::DFG::SpeculativeJIT::compileCreateDirectArguments):
(JSC::DFG::SpeculativeJIT::compileSpread):
(JSC::DFG::SpeculativeJIT::compileAllocatePropertyStorage):
(JSC::DFG::SpeculativeJIT::compileReallocatePropertyStorage):
(JSC::DFG::SpeculativeJIT::compileNewStringObject):
(JSC::DFG::SpeculativeJIT::compileNewTypedArray):
(JSC::DFG::SpeculativeJIT::compileStoreBarrier):
* dfg/DFGSpeculativeJIT64.cpp:
(JSC::DFG::SpeculativeJIT::compile):
(JSC::DFG::SpeculativeJIT::compileAllocateNewArrayWithSize):
* dfg/DFGTierUpCheckInjectionPhase.cpp:
(JSC::DFG::TierUpCheckInjectionPhase::run):
* dfg/DFGWorklist.cpp:
(JSC::DFG::Worklist::markCodeBlocks):
(JSC::DFG::Worklist::rememberCodeBlocks):
(JSC::DFG::markCodeBlocks):
(JSC::DFG::completeAllPlansForVM):
(JSC::DFG::rememberCodeBlocks):
* dfg/DFGWorklist.h:
* ftl/FTLAbstractHeapRepository.cpp:
(JSC::FTL::AbstractHeapRepository::AbstractHeapRepository):
(JSC::FTL::AbstractHeapRepository::computeRangesAndDecorateInstructions):
* ftl/FTLAbstractHeapRepository.h:
* ftl/FTLJITCode.cpp:
(JSC::FTL::JITCode::~JITCode):
* ftl/FTLLowerDFGToB3.cpp:
(JSC::FTL::DFG::LowerDFGToB3::compilePutStructure):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateActivation):
(JSC::FTL::DFG::LowerDFGToB3::compileNewFunction):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateDirectArguments):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateRest):
(JSC::FTL::DFG::LowerDFGToB3::compileNewObject):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArray):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSpread):
(JSC::FTL::DFG::LowerDFGToB3::compileSpread):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayBuffer):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSize):
(JSC::FTL::DFG::LowerDFGToB3::compileNewTypedArray):
(JSC::FTL::DFG::LowerDFGToB3::compileMakeRope):
(JSC::FTL::DFG::LowerDFGToB3::compileMultiPutByOffset):
(JSC::FTL::DFG::LowerDFGToB3::compileMaterializeNewObject):
(JSC::FTL::DFG::LowerDFGToB3::compileMaterializeCreateActivation):
(JSC::FTL::DFG::LowerDFGToB3::splatWords):
(JSC::FTL::DFG::LowerDFGToB3::allocatePropertyStorage):
(JSC::FTL::DFG::LowerDFGToB3::reallocatePropertyStorage):
(JSC::FTL::DFG::LowerDFGToB3::allocateObject):
(JSC::FTL::DFG::LowerDFGToB3::isArrayType):
(JSC::FTL::DFG::LowerDFGToB3::emitStoreBarrier):
(JSC::FTL::DFG::LowerDFGToB3::mutatorFence):
(JSC::FTL::DFG::LowerDFGToB3::setButterfly):
* ftl/FTLOSRExitCompiler.cpp:
(JSC::FTL::compileStub):
* ftl/FTLOutput.cpp:
(JSC::FTL::Output::signExt32ToPtr):
(JSC::FTL::Output::fence):
* ftl/FTLOutput.h:
* heap/CellState.h:
* heap/GCSegmentedArray.h:
* heap/Heap.cpp:
(JSC::Heap::ResumeTheWorldScope::ResumeTheWorldScope):
(JSC::Heap::ResumeTheWorldScope::~ResumeTheWorldScope):
(JSC::Heap::Heap):
(JSC::Heap::~Heap):
(JSC::Heap::harvestWeakReferences):
(JSC::Heap::finalizeUnconditionalFinalizers):
(JSC::Heap::completeAllJITPlans):
(JSC::Heap::markToFixpoint):
(JSC::Heap::gatherStackRoots):
(JSC::Heap::beginMarking):
(JSC::Heap::visitConservativeRoots):
(JSC::Heap::visitCompilerWorklistWeakReferences):
(JSC::Heap::updateObjectCounts):
(JSC::Heap::endMarking):
(JSC::Heap::addToRememberedSet):
(JSC::Heap::collectInThread):
(JSC::Heap::stopTheWorld):
(JSC::Heap::resumeTheWorld):
(JSC::Heap::setGCDidJIT):
(JSC::Heap::setNeedFinalize):
(JSC::Heap::setMutatorWaiting):
(JSC::Heap::clearMutatorWaiting):
(JSC::Heap::finalize):
(JSC::Heap::flushWriteBarrierBuffer):
(JSC::Heap::writeBarrierSlowPath):
(JSC::Heap::canCollect):
(JSC::Heap::reportExtraMemoryVisited):
(JSC::Heap::reportExternalMemoryVisited):
(JSC::Heap::notifyIsSafeToCollect):
(JSC::Heap::markRoots): Deleted.
(JSC::Heap::visitExternalRememberedSet): Deleted.
(JSC::Heap::visitSmallStrings): Deleted.
(JSC::Heap::visitProtectedObjects): Deleted.
(JSC::Heap::visitArgumentBuffers): Deleted.
(JSC::Heap::visitException): Deleted.
(JSC::Heap::visitStrongHandles): Deleted.
(JSC::Heap::visitHandleStack): Deleted.
(JSC::Heap::visitSamplingProfiler): Deleted.
(JSC::Heap::visitTypeProfiler): Deleted.
(JSC::Heap::visitShadowChicken): Deleted.
(JSC::Heap::traceCodeBlocksAndJITStubRoutines): Deleted.
(JSC::Heap::visitWeakHandles): Deleted.
(JSC::Heap::flushOldStructureIDTables): Deleted.
(JSC::Heap::stopAllocation): Deleted.
* heap/Heap.h:
(JSC::Heap::collectorSlotVisitor):
(JSC::Heap::mutatorMarkStack):
(JSC::Heap::mutatorShouldBeFenced):
(JSC::Heap::addressOfMutatorShouldBeFenced):
(JSC::Heap::slotVisitor): Deleted.
(JSC::Heap::notifyIsSafeToCollect): Deleted.
(JSC::Heap::barrierShouldBeFenced): Deleted.
(JSC::Heap::addressOfBarrierShouldBeFenced): Deleted.
* heap/MarkStack.cpp:
(JSC::MarkStackArray::transferTo):
* heap/MarkStack.h:
* heap/MarkedAllocator.cpp:
(JSC::MarkedAllocator::tryAllocateIn):
* heap/MarkedBlock.cpp:
(JSC::MarkedBlock::MarkedBlock):
(JSC::MarkedBlock::Handle::specializedSweep):
(JSC::MarkedBlock::Handle::sweep):
(JSC::MarkedBlock::Handle::sweepHelperSelectMarksMode):
(JSC::MarkedBlock::Handle::stopAllocating):
(JSC::MarkedBlock::Handle::resumeAllocating):
(JSC::MarkedBlock::aboutToMarkSlow):
(JSC::MarkedBlock::Handle::didConsumeFreeList):
(JSC::SetNewlyAllocatedFunctor::SetNewlyAllocatedFunctor): Deleted.
(JSC::SetNewlyAllocatedFunctor::operator()): Deleted.
* heap/MarkedBlock.h:
* heap/MarkedSpace.cpp:
(JSC::MarkedSpace::resumeAllocating):
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::SlotVisitor):
(JSC::SlotVisitor::~SlotVisitor):
(JSC::SlotVisitor::reset):
(JSC::SlotVisitor::clearMarkStacks):
(JSC::SlotVisitor::appendJSCellOrAuxiliary):
(JSC::SlotVisitor::setMarkedAndAppendToMarkStack):
(JSC::SlotVisitor::appendToMarkStack):
(JSC::SlotVisitor::appendToMutatorMarkStack):
(JSC::SlotVisitor::visitChildren):
(JSC::SlotVisitor::donateKnownParallel):
(JSC::SlotVisitor::drain):
(JSC::SlotVisitor::drainFromShared):
(JSC::SlotVisitor::containsOpaqueRoot):
(JSC::SlotVisitor::donateAndDrain):
(JSC::SlotVisitor::mergeOpaqueRoots):
(JSC::SlotVisitor::dump):
(JSC::SlotVisitor::clearMarkStack): Deleted.
(JSC::SlotVisitor::opaqueRootCount): Deleted.
* heap/SlotVisitor.h:
(JSC::SlotVisitor::collectorMarkStack):
(JSC::SlotVisitor::mutatorMarkStack):
(JSC::SlotVisitor::isEmpty):
(JSC::SlotVisitor::bytesVisited):
(JSC::SlotVisitor::markStack): Deleted.
(JSC::SlotVisitor::bytesCopied): Deleted.
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::reportExtraMemoryVisited):
(JSC::SlotVisitor::reportExternalMemoryVisited):
* jit/AssemblyHelpers.cpp:
(JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo):
* jit/AssemblyHelpers.h:
(JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo):
(JSC::AssemblyHelpers::barrierStoreLoadFence):
(JSC::AssemblyHelpers::mutatorFence):
(JSC::AssemblyHelpers::storeButterfly):
(JSC::AssemblyHelpers::jumpIfMutatorFenceNotNeeded):
(JSC::AssemblyHelpers::emitInitializeInlineStorage):
(JSC::AssemblyHelpers::emitInitializeOutOfLineStorage):
(JSC::AssemblyHelpers::jumpIfBarrierStoreLoadFenceNotNeeded): Deleted.
* jit/JITInlines.h:
(JSC::JIT::emitArrayProfilingSiteWithCell):
* jit/JITOperations.cpp:
* jit/JITPropertyAccess.cpp:
(JSC::JIT::emit_op_put_to_scope):
(JSC::JIT::emit_op_put_to_arguments):
* llint/LLIntData.cpp:
(JSC::LLInt::Data::performAssertions):
* llint/LowLevelInterpreter.asm:
* llint/LowLevelInterpreter64.asm:
* runtime/ButterflyInlines.h:
(JSC::Butterfly::create):
(JSC::Butterfly::createOrGrowPropertyStorage):
* runtime/ConcurrentJITLock.h:
(JSC::GCSafeConcurrentJITLocker::NoDefer::NoDefer): Deleted.
* runtime/GenericArgumentsInlines.h:
(JSC::GenericArguments<Type>::getOwnPropertySlotByIndex):
(JSC::GenericArguments<Type>::putByIndex):
* runtime/IndexingType.h:
* runtime/JSArray.cpp:
(JSC::JSArray::unshiftCountSlowCase):
(JSC::JSArray::unshiftCountWithArrayStorage):
* runtime/JSCell.h:
(JSC::JSCell::InternalLocker::InternalLocker):
(JSC::JSCell::InternalLocker::~InternalLocker):
(JSC::JSCell::atomicCompareExchangeCellStateWeakRelaxed):
(JSC::JSCell::atomicCompareExchangeCellStateStrong):
(JSC::JSCell::indexingTypeAndMiscOffset):
(JSC::JSCell::indexingTypeOffset): Deleted.
* runtime/JSCellInlines.h:
(JSC::JSCell::JSCell):
(JSC::JSCell::finishCreation):
(JSC::JSCell::indexingTypeAndMisc):
(JSC::JSCell::indexingType):
(JSC::JSCell::setStructure):
(JSC::JSCell::callDestructor):
(JSC::JSCell::lockInternalLock):
(JSC::JSCell::unlockInternalLock):
* runtime/JSObject.cpp:
(JSC::JSObject::visitButterfly):
(JSC::JSObject::visitChildren):
(JSC::JSFinalObject::visitChildren):
(JSC::JSObject::enterDictionaryIndexingModeWhenArrayStorageAlreadyExists):
(JSC::JSObject::createInitialUndecided):
(JSC::JSObject::createInitialInt32):
(JSC::JSObject::createInitialDouble):
(JSC::JSObject::createInitialContiguous):
(JSC::JSObject::createArrayStorage):
(JSC::JSObject::convertUndecidedToArrayStorage):
(JSC::JSObject::convertInt32ToArrayStorage):
(JSC::JSObject::convertDoubleToArrayStorage):
(JSC::JSObject::convertContiguousToArrayStorage):
(JSC::JSObject::deleteProperty):
(JSC::JSObject::defineOwnIndexedProperty):
(JSC::JSObject::increaseVectorLength):
(JSC::JSObject::ensureLengthSlow):
(JSC::JSObject::reallocateAndShrinkButterfly):
(JSC::JSObject::allocateMoreOutOfLineStorage):
(JSC::JSObject::shiftButterflyAfterFlattening):
(JSC::JSObject::growOutOfLineStorage): Deleted.
* runtime/JSObject.h:
(JSC::JSFinalObject::JSFinalObject):
(JSC::JSObject::setButterfly):
(JSC::JSObject::getOwnNonIndexPropertySlot):
(JSC::JSObject::fillCustomGetterPropertySlot):
(JSC::JSObject::getOwnPropertySlot):
(JSC::JSObject::getPropertySlot):
(JSC::JSObject::setStructureAndButterfly): Deleted.
(JSC::JSObject::setButterflyWithoutChangingStructure): Deleted.
(JSC::JSObject::putDirectInternal): Deleted.
(JSC::JSObject::putDirectWithoutTransition): Deleted.
* runtime/JSObjectInlines.h:
(JSC::JSObject::getPropertySlot):
(JSC::JSObject::getNonIndexPropertySlot):
(JSC::JSObject::putDirectWithoutTransition):
(JSC::JSObject::putDirectInternal):
* runtime/Options.h:
* runtime/SparseArrayValueMap.h:
* runtime/Structure.cpp:
(JSC::Structure::dumpStatistics):
(JSC::Structure::findStructuresAndMapForMaterialization):
(JSC::Structure::materializePropertyTable):
(JSC::Structure::addNewPropertyTransition):
(JSC::Structure::changePrototypeTransition):
(JSC::Structure::attributeChangeTransition):
(JSC::Structure::toDictionaryTransition):
(JSC::Structure::takePropertyTableOrCloneIfPinned):
(JSC::Structure::nonPropertyTransition):
(JSC::Structure::isSealed):
(JSC::Structure::isFrozen):
(JSC::Structure::flattenDictionaryStructure):
(JSC::Structure::pin):
(JSC::Structure::pinForCaching):
(JSC::Structure::willStoreValueSlow):
(JSC::Structure::copyPropertyTableForPinning):
(JSC::Structure::add):
(JSC::Structure::remove):
(JSC::Structure::getPropertyNamesFromStructure):
(JSC::Structure::visitChildren):
(JSC::Structure::materializePropertyMap): Deleted.
(JSC::Structure::addPropertyWithoutTransition): Deleted.
(JSC::Structure::removePropertyWithoutTransition): Deleted.
(JSC::Structure::copyPropertyTable): Deleted.
(JSC::Structure::createPropertyMap): Deleted.
(JSC::PropertyTable::checkConsistency): Deleted.
(JSC::Structure::checkConsistency): Deleted.
* runtime/Structure.h:
* runtime/StructureIDBlob.h:
(JSC::StructureIDBlob::StructureIDBlob):
(JSC::StructureIDBlob::indexingTypeIncludingHistory):
(JSC::StructureIDBlob::setIndexingTypeIncludingHistory):
(JSC::StructureIDBlob::indexingTypeIncludingHistoryOffset):
(JSC::StructureIDBlob::indexingType): Deleted.
(JSC::StructureIDBlob::setIndexingType): Deleted.
(JSC::StructureIDBlob::indexingTypeOffset): Deleted.
* runtime/StructureInlines.h:
(JSC::Structure::get):
(JSC::Structure::checkOffsetConsistency):
(JSC::Structure::checkConsistency):
(JSC::Structure::add):
(JSC::Structure::remove):
(JSC::Structure::addPropertyWithoutTransition):
(JSC::Structure::removePropertyWithoutTransition):
(JSC::Structure::setPropertyTable):
(JSC::Structure::putWillGrowOutOfLineStorage): Deleted.
(JSC::Structure::propertyTable): Deleted.
(JSC::Structure::suggestedNewOutOfLineStorageCapacity): Deleted.
Source/WTF:
The reason why I went to such great pains to make WTF::Lock fit in two bits is that I
knew that I would eventually need to stuff one into some miscellaneous bits of the
JSCell header. That time has come, because the concurrent GC has numerous race
conditions in visitChildren that can be trivially fixed if each object just has an
internal lock. Some cell types might use it to simply protect their entire visitChildren
function and anything that mutates the fields it touches, while other cell types might
use it as a "lock of last resort" to handle corner cases of an otherwise wait-free or
lock-free algorithm. Right now, it's used to protect certain transformations involving
indexing storage.
To make this happen, I factored the WTF::Lock algorithm into a LockAlgorithm struct that
is templatized on lock type (uint8_t for WTF::Lock), the isHeldBit value (1 for
WTF::Lock), and the hasParkedBit value (2 for WTF::Lock). This could have been done as
a templatized Lock class that basically contains Atomic<LockType>. You could then make
any field into a lock by bitwise_casting it to TemplateLock<field type, bit1, bit2>. But
this felt too dirty, so instead, LockAlgorithm has static methods that take
Atomic<LockType>& as their first argument. I think that this makes it more natural to
project a LockAlgorithm onto an existing Atomic<> field. Sadly, some places have to cast
their non-Atomic<> field to Atomic<> in order for this to work. Like so many other things
we do, this just shows that the C++ style of labeling fields that are subject to atomic
ops as atomic is counterproductive. Maybe some day I'll change LockAlgorithm to use our
other Atomics API, which does not require Atomic<>.
WTF::Lock now uses LockAlgorithm. The slow paths are still outlined. I don't feel too
bad about the LockAlgorithm.h header being included in so many places because we change
that algorithm so infrequently.
Also, I added a hasElapsed(time) function. This function makes it so much more natural
to write timeslicing code, which the concurrent GC has to do a lot of.
* WTF.xcodeproj/project.pbxproj:
* wtf/CMakeLists.txt:
* wtf/ListDump.h:
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
(WTF::LockBase::unlockFairlySlow):
(WTF::LockBase::unlockSlowImpl): Deleted.
* wtf/Lock.h:
(WTF::LockBase::lock):
(WTF::LockBase::tryLock):
(WTF::LockBase::unlock):
(WTF::LockBase::unlockFairly):
(WTF::LockBase::isHeld):
(): Deleted.
* wtf/LockAlgorithm.h: Added.
(WTF::LockAlgorithm::lockFastAssumingZero):
(WTF::LockAlgorithm::lockFast):
(WTF::LockAlgorithm::lock):
(WTF::LockAlgorithm::tryLock):
(WTF::LockAlgorithm::unlockFastAssumingZero):
(WTF::LockAlgorithm::unlockFast):
(WTF::LockAlgorithm::unlock):
(WTF::LockAlgorithm::unlockFairly):
(WTF::LockAlgorithm::isLocked):
(WTF::LockAlgorithm::lockSlow):
(WTF::LockAlgorithm::unlockSlow):
* wtf/TimeWithDynamicClockType.cpp:
(WTF::hasElapsed):
* wtf/TimeWithDynamicClockType.h:
Canonical link: https://commits.webkit.org/182434@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@208720 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-11-15 01:49:22 +00:00
|
|
|
if (UNLIKELY(!DefaultLockAlgorithm::lockFastAssumingZero(m_byte)))
|
|
|
|
lockSlow();
|
Lightweight locks should be adaptive
https://bugs.webkit.org/show_bug.cgi?id=147545
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
* dfg/DFGCommon.cpp:
(JSC::DFG::startCrashing):
* heap/CopiedBlock.h:
(JSC::CopiedBlock::workListLock):
* heap/CopiedBlockInlines.h:
(JSC::CopiedBlock::shouldReportLiveBytes):
(JSC::CopiedBlock::reportLiveBytes):
* heap/CopiedSpace.cpp:
(JSC::CopiedSpace::doneFillingBlock):
* heap/CopiedSpace.h:
(JSC::CopiedSpace::CopiedGeneration::CopiedGeneration):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::recycleEvacuatedBlock):
* heap/GCThreadSharedData.cpp:
(JSC::GCThreadSharedData::didStartCopying):
* heap/GCThreadSharedData.h:
(JSC::GCThreadSharedData::getNextBlocksToCopy):
* heap/ListableHandler.h:
(JSC::ListableHandler::List::addThreadSafe):
(JSC::ListableHandler::List::addNotThreadSafe):
* heap/MachineStackMarker.cpp:
(JSC::MachineThreads::tryCopyOtherThreadStacks):
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::copyLater):
* parser/SourceProvider.cpp:
(JSC::SourceProvider::~SourceProvider):
(JSC::SourceProvider::getID):
* profiler/ProfilerDatabase.cpp:
(JSC::Profiler::Database::addDatabaseToAtExit):
(JSC::Profiler::Database::removeDatabaseFromAtExit):
(JSC::Profiler::Database::removeFirstAtExitDatabase):
* runtime/TypeProfilerLog.h:
Source/WebCore:
* bindings/objc/WebScriptObject.mm:
(WebCore::getJSWrapper):
(WebCore::addJSWrapper):
(WebCore::removeJSWrapper):
(WebCore::removeJSWrapperIfRetainCountOne):
* platform/audio/mac/CARingBuffer.cpp:
(WebCore::CARingBuffer::setCurrentFrameBounds):
(WebCore::CARingBuffer::getCurrentFrameBounds):
* platform/audio/mac/CARingBuffer.h:
* platform/ios/wak/WAKWindow.mm:
(-[WAKWindow setExposedScrollViewRect:]):
(-[WAKWindow exposedScrollViewRect]):
Source/WebKit2:
* WebProcess/WebPage/EventDispatcher.cpp:
(WebKit::EventDispatcher::clearQueuedTouchEventsForPage):
(WebKit::EventDispatcher::getQueuedTouchEventsForPage):
(WebKit::EventDispatcher::touchEvent):
(WebKit::EventDispatcher::dispatchTouchEvents):
* WebProcess/WebPage/EventDispatcher.h:
* WebProcess/WebPage/ViewUpdateDispatcher.cpp:
(WebKit::ViewUpdateDispatcher::visibleContentRectUpdate):
(WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate):
* WebProcess/WebPage/ViewUpdateDispatcher.h:
Source/WTF:
A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition
overhead is lower than system locks and because they take dramatically less space than system
locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock
acquire is up to 10x faster and under microcontention - short critical section with two or
more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4
bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit
should continue to avoid system locks - they are just far too slow and far too big.
But there is a problem with this idiom. System lock implementations will sleep a thread when
it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU.
In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for
microcontention, but awful when the lock will not be released for a while. In fact, when
critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is
almost 100x more than the CPU time cost of a system lock. This case doesn't arise too
frequently in our current uses of spinlocks, but that's probably because right now there are
places where we make a conscious decision to use system locks - even though they use more
memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a
while to acquire the lock.
The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new
concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks
that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The
idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held,
the slow path tries some number of spins to acquire the lock, and if that fails, the thread is
put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and
the lock itself is a tagged pointer: either it is just bits telling us the complete lock state
(not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the
lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path
if the lock is not contended, a short burst of spinning for microcontention, and a full-blown
queue for critical sections that are held for a long time.
On a locking microbenchmark, this new Lock exhibits the following performance
characteristics:
- Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster
than a system mutex.
- Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster
than a system mutex.
- CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a
SpinLock.
- Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than
a SpinLock.
This patch replaces all uses of SpinLock with Lock, since our critical sections are not
no-ops so if you do basically anything in your critical section, the Lock overhead will be
invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead
instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is
as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time.
This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits
of having a lock that just uses a byte are still better than the CPU wastage benefits of
Lock. But, this work will enable some future work to create locks that will fit in just 1.6
bits: https://bugs.webkit.org/show_bug.cgi?id=147665.
Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak
CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if
there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the
possibility of a null queue head.
[1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf
* WTF.vcxproj/WTF.vcxproj:
* WTF.xcodeproj/project.pbxproj:
* benchmarks: Added.
* benchmarks/LockSpeedTest.cpp: Added.
(main):
* wtf/Atomics.h:
(WTF::Atomic::compareExchangeWeak):
(WTF::Atomic::compareExchangeStrong):
* wtf/CMakeLists.txt:
* wtf/Lock.cpp: Added.
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
* wtf/Lock.h: Added.
(WTF::LockBase::lock):
(WTF::LockBase::unlock):
(WTF::LockBase::isHeld):
(WTF::LockBase::isLocked):
(WTF::Lock::Lock):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocatorHandle::shrink):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
(WTF::MetaAllocator::addFreshFreeSpace):
(WTF::MetaAllocator::debugFreeSpaceSize):
* wtf/MetaAllocator.h:
* wtf/SpinLock.h:
* wtf/ThreadingPthreads.cpp:
* wtf/ThreadingWin.cpp:
* wtf/text/AtomicString.cpp:
* wtf/text/AtomicStringImpl.cpp:
(WTF::AtomicStringTableLocker::AtomicStringTableLocker):
Tools:
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/Lock.cpp: Added.
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/165908@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
|
|
|
}
|
|
|
|
|
2021-05-30 20:35:59 +00:00
|
|
|
bool tryLock() WTF_ACQUIRES_LOCK_IF(true) // NOLINT: Intentional deviation to support std::scoped_lock.
|
2015-08-15 00:14:52 +00:00
|
|
|
{
|
The GC should be optionally concurrent and disabled by default
https://bugs.webkit.org/show_bug.cgi?id=164454
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
This started out as a patch to have the GC scan the stack at the end, and then the
outage happened and I decided to pick a more aggresive target: give the GC a concurrent
mode that can be enabled at runtime, and whose only effect is that it turns on the
ResumeTheWorldScope. This gives our GC a really intuitive workflow: by default, the GC
thread is running solo with the world stopped and the parallel markers converged and
waiting. We have a parallel work scope to enable the parallel markers and now we have a
ResumeTheWorldScope that will optionally resume the world and then stop it again.
It's easy to make a concurrent GC that always instantly crashes. I can't promise that
this one won't do that when you run it. I set a specific goal: I wanted to do >10
concurrent GCs in debug mode with generations, optimizing JITs, and parallel marking
disabled.
To reach this milestone, I needed to do a bunch of stuff:
- The mutator needs a separate mark stack for the barrier, since it will mutate this
stack concurrently to the collector's slot visitors.
- The use of CellState to indicate whether an object is being scanned the first time or
a subsequent time was racy. It fails spectacularly when a barrier is fired at the same
time as visitChildren is running or if the barrier runs at the same time as the GC
marks the same object. So, I split SlotVisitor's mark stacks. It's now the case that
you know why you're being scanned by looking at which stack you came off of.
- All of root marking must be in the collector fixpoint. I renamed markRoots to
markToFixpoint. They say concurrency is hard, but the collector looks more intuitive
this way. We never gained anything from forcing people to make a choice between
scanning something in the fixpoint versus outside of it. Because root scanning is
cheap, we can afford to do it repeatedly, which means all root scanning can now do
constraint-based marking (like: I'll mark you if that thing is marked).
- JSObject::visitChildren's scanning of the butterfly raced with property additions,
indexed storage transitions and resizing, and a bunch of miscellaneous dirty butterfly
reshaping functions - like the one that flattens a dictionary and some sneaky
ArrayStorage transformations. Many of these can be fixed by using store-store fences
in the mutator and load-load fences in the collector. I've adopted the rule that the
collector must always see either a butterfly and structure that match or a newer
butterfly with an older structure, where their age is just one transition apart. This
can be achieved with fences. For the cases where it breaks down, I added a lock to
every JSCell. This is a full-fledged WTF lock that we sneak into two available bits in
the indexingType. See the WTF ChangeLog for details.
The mutator fencing rules are as follows:
- Store-store fence before and after setting the butterfly.
- Store-store fence before setting structure if you had changed the shape of the
butterfly.
- Store-store fence after initializing all fields in an allocation.
- A dictionary Structure can change in strange ways while the GC is trying to scan it.
So, JSObject::visitChildren will now grab the object's structure's lock if the
object's structure is a dictionary. Dictionary structures are 1:1 with their object,
so this does not reduce GC parallelism (super unlikely that the GC will simultaneously
scan an object from two threads).
- The GC can blow away a Structure's property table at any time. As a small consolation,
it's now holding the Structure's lock when it does so. But there was tons of code in
Structure that uses DeferGC to prevent the GC from blowing away the property table.
This doesn't work with concurrent GC, since DeferGC only means that the GC won't run
its safepoint (i.e. stop-the-world code) in the DeferGC region. It will still do
marking and it was the Structure::visitChildren that would delete the table. It turns
out that Structure's reliance on the property table not being deleted was the product
of code rot. We already had functions that would materialize the table on demand. We
were simply making the mistake of saying:
structure->materializePropertyMap();
...
structure->propertyTable()->things
Instead of saying:
PropertyTable* table = structure->ensurePropertyTable();
...
table->things
Switching the code to use the latter idiom allowed me to simplify the code a lot while
fixing the race.
- The LLInt's get_by_val handling was broken because the indexing shape constants were
wrong. Once I started putting more things into the IndexingType, that started causing
crashes for me. So I fixed LLInt. That turned out to be a lot of work, since that code
had rotted in subtle ways.
This is a speed-up in SunSpider, probably because of the LLInt fix. This is neutral on
Octane and Kraken. It's a smaller slow-down on LongSpider, but I think we can ignore
that (we don't view LongSpider as an official benchmark). By default, the concurrent GC
is disabled: in all of the places where it would have resumed the world to run marking
concurrently to the mutator, it will just skip the resume step. When you enable
concurrent GC (--useConcurrentGC=true), it can sometimes run Octane/splay to completion.
It seems to perform quite well: on my machine, it improves both splay-throughput and
splay-latency. It's probably unstable for other programs.
* API/JSVirtualMachine.mm:
(-[JSVirtualMachine isOldExternalObject:]):
* assembler/MacroAssemblerARMv7.h:
(JSC::MacroAssemblerARMv7::storeFence):
* bytecode/InlineAccess.cpp:
(JSC::InlineAccess::dumpCacheSizesAndCrash):
(JSC::InlineAccess::generateSelfPropertyAccess):
(JSC::InlineAccess::generateArrayLength):
* bytecode/ObjectAllocationProfile.h:
(JSC::ObjectAllocationProfile::offsetOfInlineCapacity):
(JSC::ObjectAllocationProfile::ObjectAllocationProfile):
(JSC::ObjectAllocationProfile::initialize):
(JSC::ObjectAllocationProfile::inlineCapacity):
(JSC::ObjectAllocationProfile::clear):
* bytecode/PolymorphicAccess.cpp:
(JSC::AccessCase::generateWithGuard):
(JSC::AccessCase::generateImpl):
* dfg/DFGArrayifySlowPathGenerator.h:
* dfg/DFGClobberize.h:
(JSC::DFG::clobberize):
* dfg/DFGOSRExitCompiler32_64.cpp:
(JSC::DFG::OSRExitCompiler::compileExit):
* dfg/DFGOSRExitCompiler64.cpp:
(JSC::DFG::OSRExitCompiler::compileExit):
* dfg/DFGOperations.cpp:
* dfg/DFGPlan.cpp:
(JSC::DFG::Plan::markCodeBlocks):
(JSC::DFG::Plan::rememberCodeBlocks):
* dfg/DFGPlan.h:
* dfg/DFGSpeculativeJIT.cpp:
(JSC::DFG::SpeculativeJIT::emitAllocateRawObject):
(JSC::DFG::SpeculativeJIT::checkArray):
(JSC::DFG::SpeculativeJIT::arrayify):
(JSC::DFG::SpeculativeJIT::compileMakeRope):
(JSC::DFG::SpeculativeJIT::compileNewFunctionCommon):
(JSC::DFG::SpeculativeJIT::compileCreateActivation):
(JSC::DFG::SpeculativeJIT::compileCreateDirectArguments):
(JSC::DFG::SpeculativeJIT::compileSpread):
(JSC::DFG::SpeculativeJIT::compileAllocatePropertyStorage):
(JSC::DFG::SpeculativeJIT::compileReallocatePropertyStorage):
(JSC::DFG::SpeculativeJIT::compileNewStringObject):
(JSC::DFG::SpeculativeJIT::compileNewTypedArray):
(JSC::DFG::SpeculativeJIT::compileStoreBarrier):
* dfg/DFGSpeculativeJIT64.cpp:
(JSC::DFG::SpeculativeJIT::compile):
(JSC::DFG::SpeculativeJIT::compileAllocateNewArrayWithSize):
* dfg/DFGTierUpCheckInjectionPhase.cpp:
(JSC::DFG::TierUpCheckInjectionPhase::run):
* dfg/DFGWorklist.cpp:
(JSC::DFG::Worklist::markCodeBlocks):
(JSC::DFG::Worklist::rememberCodeBlocks):
(JSC::DFG::markCodeBlocks):
(JSC::DFG::completeAllPlansForVM):
(JSC::DFG::rememberCodeBlocks):
* dfg/DFGWorklist.h:
* ftl/FTLAbstractHeapRepository.cpp:
(JSC::FTL::AbstractHeapRepository::AbstractHeapRepository):
(JSC::FTL::AbstractHeapRepository::computeRangesAndDecorateInstructions):
* ftl/FTLAbstractHeapRepository.h:
* ftl/FTLJITCode.cpp:
(JSC::FTL::JITCode::~JITCode):
* ftl/FTLLowerDFGToB3.cpp:
(JSC::FTL::DFG::LowerDFGToB3::compilePutStructure):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateActivation):
(JSC::FTL::DFG::LowerDFGToB3::compileNewFunction):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateDirectArguments):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateRest):
(JSC::FTL::DFG::LowerDFGToB3::compileNewObject):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArray):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSpread):
(JSC::FTL::DFG::LowerDFGToB3::compileSpread):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayBuffer):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSize):
(JSC::FTL::DFG::LowerDFGToB3::compileNewTypedArray):
(JSC::FTL::DFG::LowerDFGToB3::compileMakeRope):
(JSC::FTL::DFG::LowerDFGToB3::compileMultiPutByOffset):
(JSC::FTL::DFG::LowerDFGToB3::compileMaterializeNewObject):
(JSC::FTL::DFG::LowerDFGToB3::compileMaterializeCreateActivation):
(JSC::FTL::DFG::LowerDFGToB3::splatWords):
(JSC::FTL::DFG::LowerDFGToB3::allocatePropertyStorage):
(JSC::FTL::DFG::LowerDFGToB3::reallocatePropertyStorage):
(JSC::FTL::DFG::LowerDFGToB3::allocateObject):
(JSC::FTL::DFG::LowerDFGToB3::isArrayType):
(JSC::FTL::DFG::LowerDFGToB3::emitStoreBarrier):
(JSC::FTL::DFG::LowerDFGToB3::mutatorFence):
(JSC::FTL::DFG::LowerDFGToB3::setButterfly):
* ftl/FTLOSRExitCompiler.cpp:
(JSC::FTL::compileStub):
* ftl/FTLOutput.cpp:
(JSC::FTL::Output::signExt32ToPtr):
(JSC::FTL::Output::fence):
* ftl/FTLOutput.h:
* heap/CellState.h:
* heap/GCSegmentedArray.h:
* heap/Heap.cpp:
(JSC::Heap::ResumeTheWorldScope::ResumeTheWorldScope):
(JSC::Heap::ResumeTheWorldScope::~ResumeTheWorldScope):
(JSC::Heap::Heap):
(JSC::Heap::~Heap):
(JSC::Heap::harvestWeakReferences):
(JSC::Heap::finalizeUnconditionalFinalizers):
(JSC::Heap::completeAllJITPlans):
(JSC::Heap::markToFixpoint):
(JSC::Heap::gatherStackRoots):
(JSC::Heap::beginMarking):
(JSC::Heap::visitConservativeRoots):
(JSC::Heap::visitCompilerWorklistWeakReferences):
(JSC::Heap::updateObjectCounts):
(JSC::Heap::endMarking):
(JSC::Heap::addToRememberedSet):
(JSC::Heap::collectInThread):
(JSC::Heap::stopTheWorld):
(JSC::Heap::resumeTheWorld):
(JSC::Heap::setGCDidJIT):
(JSC::Heap::setNeedFinalize):
(JSC::Heap::setMutatorWaiting):
(JSC::Heap::clearMutatorWaiting):
(JSC::Heap::finalize):
(JSC::Heap::flushWriteBarrierBuffer):
(JSC::Heap::writeBarrierSlowPath):
(JSC::Heap::canCollect):
(JSC::Heap::reportExtraMemoryVisited):
(JSC::Heap::reportExternalMemoryVisited):
(JSC::Heap::notifyIsSafeToCollect):
(JSC::Heap::markRoots): Deleted.
(JSC::Heap::visitExternalRememberedSet): Deleted.
(JSC::Heap::visitSmallStrings): Deleted.
(JSC::Heap::visitProtectedObjects): Deleted.
(JSC::Heap::visitArgumentBuffers): Deleted.
(JSC::Heap::visitException): Deleted.
(JSC::Heap::visitStrongHandles): Deleted.
(JSC::Heap::visitHandleStack): Deleted.
(JSC::Heap::visitSamplingProfiler): Deleted.
(JSC::Heap::visitTypeProfiler): Deleted.
(JSC::Heap::visitShadowChicken): Deleted.
(JSC::Heap::traceCodeBlocksAndJITStubRoutines): Deleted.
(JSC::Heap::visitWeakHandles): Deleted.
(JSC::Heap::flushOldStructureIDTables): Deleted.
(JSC::Heap::stopAllocation): Deleted.
* heap/Heap.h:
(JSC::Heap::collectorSlotVisitor):
(JSC::Heap::mutatorMarkStack):
(JSC::Heap::mutatorShouldBeFenced):
(JSC::Heap::addressOfMutatorShouldBeFenced):
(JSC::Heap::slotVisitor): Deleted.
(JSC::Heap::notifyIsSafeToCollect): Deleted.
(JSC::Heap::barrierShouldBeFenced): Deleted.
(JSC::Heap::addressOfBarrierShouldBeFenced): Deleted.
* heap/MarkStack.cpp:
(JSC::MarkStackArray::transferTo):
* heap/MarkStack.h:
* heap/MarkedAllocator.cpp:
(JSC::MarkedAllocator::tryAllocateIn):
* heap/MarkedBlock.cpp:
(JSC::MarkedBlock::MarkedBlock):
(JSC::MarkedBlock::Handle::specializedSweep):
(JSC::MarkedBlock::Handle::sweep):
(JSC::MarkedBlock::Handle::sweepHelperSelectMarksMode):
(JSC::MarkedBlock::Handle::stopAllocating):
(JSC::MarkedBlock::Handle::resumeAllocating):
(JSC::MarkedBlock::aboutToMarkSlow):
(JSC::MarkedBlock::Handle::didConsumeFreeList):
(JSC::SetNewlyAllocatedFunctor::SetNewlyAllocatedFunctor): Deleted.
(JSC::SetNewlyAllocatedFunctor::operator()): Deleted.
* heap/MarkedBlock.h:
* heap/MarkedSpace.cpp:
(JSC::MarkedSpace::resumeAllocating):
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::SlotVisitor):
(JSC::SlotVisitor::~SlotVisitor):
(JSC::SlotVisitor::reset):
(JSC::SlotVisitor::clearMarkStacks):
(JSC::SlotVisitor::appendJSCellOrAuxiliary):
(JSC::SlotVisitor::setMarkedAndAppendToMarkStack):
(JSC::SlotVisitor::appendToMarkStack):
(JSC::SlotVisitor::appendToMutatorMarkStack):
(JSC::SlotVisitor::visitChildren):
(JSC::SlotVisitor::donateKnownParallel):
(JSC::SlotVisitor::drain):
(JSC::SlotVisitor::drainFromShared):
(JSC::SlotVisitor::containsOpaqueRoot):
(JSC::SlotVisitor::donateAndDrain):
(JSC::SlotVisitor::mergeOpaqueRoots):
(JSC::SlotVisitor::dump):
(JSC::SlotVisitor::clearMarkStack): Deleted.
(JSC::SlotVisitor::opaqueRootCount): Deleted.
* heap/SlotVisitor.h:
(JSC::SlotVisitor::collectorMarkStack):
(JSC::SlotVisitor::mutatorMarkStack):
(JSC::SlotVisitor::isEmpty):
(JSC::SlotVisitor::bytesVisited):
(JSC::SlotVisitor::markStack): Deleted.
(JSC::SlotVisitor::bytesCopied): Deleted.
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::reportExtraMemoryVisited):
(JSC::SlotVisitor::reportExternalMemoryVisited):
* jit/AssemblyHelpers.cpp:
(JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo):
* jit/AssemblyHelpers.h:
(JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo):
(JSC::AssemblyHelpers::barrierStoreLoadFence):
(JSC::AssemblyHelpers::mutatorFence):
(JSC::AssemblyHelpers::storeButterfly):
(JSC::AssemblyHelpers::jumpIfMutatorFenceNotNeeded):
(JSC::AssemblyHelpers::emitInitializeInlineStorage):
(JSC::AssemblyHelpers::emitInitializeOutOfLineStorage):
(JSC::AssemblyHelpers::jumpIfBarrierStoreLoadFenceNotNeeded): Deleted.
* jit/JITInlines.h:
(JSC::JIT::emitArrayProfilingSiteWithCell):
* jit/JITOperations.cpp:
* jit/JITPropertyAccess.cpp:
(JSC::JIT::emit_op_put_to_scope):
(JSC::JIT::emit_op_put_to_arguments):
* llint/LLIntData.cpp:
(JSC::LLInt::Data::performAssertions):
* llint/LowLevelInterpreter.asm:
* llint/LowLevelInterpreter64.asm:
* runtime/ButterflyInlines.h:
(JSC::Butterfly::create):
(JSC::Butterfly::createOrGrowPropertyStorage):
* runtime/ConcurrentJITLock.h:
(JSC::GCSafeConcurrentJITLocker::NoDefer::NoDefer): Deleted.
* runtime/GenericArgumentsInlines.h:
(JSC::GenericArguments<Type>::getOwnPropertySlotByIndex):
(JSC::GenericArguments<Type>::putByIndex):
* runtime/IndexingType.h:
* runtime/JSArray.cpp:
(JSC::JSArray::unshiftCountSlowCase):
(JSC::JSArray::unshiftCountWithArrayStorage):
* runtime/JSCell.h:
(JSC::JSCell::InternalLocker::InternalLocker):
(JSC::JSCell::InternalLocker::~InternalLocker):
(JSC::JSCell::atomicCompareExchangeCellStateWeakRelaxed):
(JSC::JSCell::atomicCompareExchangeCellStateStrong):
(JSC::JSCell::indexingTypeAndMiscOffset):
(JSC::JSCell::indexingTypeOffset): Deleted.
* runtime/JSCellInlines.h:
(JSC::JSCell::JSCell):
(JSC::JSCell::finishCreation):
(JSC::JSCell::indexingTypeAndMisc):
(JSC::JSCell::indexingType):
(JSC::JSCell::setStructure):
(JSC::JSCell::callDestructor):
(JSC::JSCell::lockInternalLock):
(JSC::JSCell::unlockInternalLock):
* runtime/JSObject.cpp:
(JSC::JSObject::visitButterfly):
(JSC::JSObject::visitChildren):
(JSC::JSFinalObject::visitChildren):
(JSC::JSObject::enterDictionaryIndexingModeWhenArrayStorageAlreadyExists):
(JSC::JSObject::createInitialUndecided):
(JSC::JSObject::createInitialInt32):
(JSC::JSObject::createInitialDouble):
(JSC::JSObject::createInitialContiguous):
(JSC::JSObject::createArrayStorage):
(JSC::JSObject::convertUndecidedToArrayStorage):
(JSC::JSObject::convertInt32ToArrayStorage):
(JSC::JSObject::convertDoubleToArrayStorage):
(JSC::JSObject::convertContiguousToArrayStorage):
(JSC::JSObject::deleteProperty):
(JSC::JSObject::defineOwnIndexedProperty):
(JSC::JSObject::increaseVectorLength):
(JSC::JSObject::ensureLengthSlow):
(JSC::JSObject::reallocateAndShrinkButterfly):
(JSC::JSObject::allocateMoreOutOfLineStorage):
(JSC::JSObject::shiftButterflyAfterFlattening):
(JSC::JSObject::growOutOfLineStorage): Deleted.
* runtime/JSObject.h:
(JSC::JSFinalObject::JSFinalObject):
(JSC::JSObject::setButterfly):
(JSC::JSObject::getOwnNonIndexPropertySlot):
(JSC::JSObject::fillCustomGetterPropertySlot):
(JSC::JSObject::getOwnPropertySlot):
(JSC::JSObject::getPropertySlot):
(JSC::JSObject::setStructureAndButterfly): Deleted.
(JSC::JSObject::setButterflyWithoutChangingStructure): Deleted.
(JSC::JSObject::putDirectInternal): Deleted.
(JSC::JSObject::putDirectWithoutTransition): Deleted.
* runtime/JSObjectInlines.h:
(JSC::JSObject::getPropertySlot):
(JSC::JSObject::getNonIndexPropertySlot):
(JSC::JSObject::putDirectWithoutTransition):
(JSC::JSObject::putDirectInternal):
* runtime/Options.h:
* runtime/SparseArrayValueMap.h:
* runtime/Structure.cpp:
(JSC::Structure::dumpStatistics):
(JSC::Structure::findStructuresAndMapForMaterialization):
(JSC::Structure::materializePropertyTable):
(JSC::Structure::addNewPropertyTransition):
(JSC::Structure::changePrototypeTransition):
(JSC::Structure::attributeChangeTransition):
(JSC::Structure::toDictionaryTransition):
(JSC::Structure::takePropertyTableOrCloneIfPinned):
(JSC::Structure::nonPropertyTransition):
(JSC::Structure::isSealed):
(JSC::Structure::isFrozen):
(JSC::Structure::flattenDictionaryStructure):
(JSC::Structure::pin):
(JSC::Structure::pinForCaching):
(JSC::Structure::willStoreValueSlow):
(JSC::Structure::copyPropertyTableForPinning):
(JSC::Structure::add):
(JSC::Structure::remove):
(JSC::Structure::getPropertyNamesFromStructure):
(JSC::Structure::visitChildren):
(JSC::Structure::materializePropertyMap): Deleted.
(JSC::Structure::addPropertyWithoutTransition): Deleted.
(JSC::Structure::removePropertyWithoutTransition): Deleted.
(JSC::Structure::copyPropertyTable): Deleted.
(JSC::Structure::createPropertyMap): Deleted.
(JSC::PropertyTable::checkConsistency): Deleted.
(JSC::Structure::checkConsistency): Deleted.
* runtime/Structure.h:
* runtime/StructureIDBlob.h:
(JSC::StructureIDBlob::StructureIDBlob):
(JSC::StructureIDBlob::indexingTypeIncludingHistory):
(JSC::StructureIDBlob::setIndexingTypeIncludingHistory):
(JSC::StructureIDBlob::indexingTypeIncludingHistoryOffset):
(JSC::StructureIDBlob::indexingType): Deleted.
(JSC::StructureIDBlob::setIndexingType): Deleted.
(JSC::StructureIDBlob::indexingTypeOffset): Deleted.
* runtime/StructureInlines.h:
(JSC::Structure::get):
(JSC::Structure::checkOffsetConsistency):
(JSC::Structure::checkConsistency):
(JSC::Structure::add):
(JSC::Structure::remove):
(JSC::Structure::addPropertyWithoutTransition):
(JSC::Structure::removePropertyWithoutTransition):
(JSC::Structure::setPropertyTable):
(JSC::Structure::putWillGrowOutOfLineStorage): Deleted.
(JSC::Structure::propertyTable): Deleted.
(JSC::Structure::suggestedNewOutOfLineStorageCapacity): Deleted.
Source/WTF:
The reason why I went to such great pains to make WTF::Lock fit in two bits is that I
knew that I would eventually need to stuff one into some miscellaneous bits of the
JSCell header. That time has come, because the concurrent GC has numerous race
conditions in visitChildren that can be trivially fixed if each object just has an
internal lock. Some cell types might use it to simply protect their entire visitChildren
function and anything that mutates the fields it touches, while other cell types might
use it as a "lock of last resort" to handle corner cases of an otherwise wait-free or
lock-free algorithm. Right now, it's used to protect certain transformations involving
indexing storage.
To make this happen, I factored the WTF::Lock algorithm into a LockAlgorithm struct that
is templatized on lock type (uint8_t for WTF::Lock), the isHeldBit value (1 for
WTF::Lock), and the hasParkedBit value (2 for WTF::Lock). This could have been done as
a templatized Lock class that basically contains Atomic<LockType>. You could then make
any field into a lock by bitwise_casting it to TemplateLock<field type, bit1, bit2>. But
this felt too dirty, so instead, LockAlgorithm has static methods that take
Atomic<LockType>& as their first argument. I think that this makes it more natural to
project a LockAlgorithm onto an existing Atomic<> field. Sadly, some places have to cast
their non-Atomic<> field to Atomic<> in order for this to work. Like so many other things
we do, this just shows that the C++ style of labeling fields that are subject to atomic
ops as atomic is counterproductive. Maybe some day I'll change LockAlgorithm to use our
other Atomics API, which does not require Atomic<>.
WTF::Lock now uses LockAlgorithm. The slow paths are still outlined. I don't feel too
bad about the LockAlgorithm.h header being included in so many places because we change
that algorithm so infrequently.
Also, I added a hasElapsed(time) function. This function makes it so much more natural
to write timeslicing code, which the concurrent GC has to do a lot of.
* WTF.xcodeproj/project.pbxproj:
* wtf/CMakeLists.txt:
* wtf/ListDump.h:
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
(WTF::LockBase::unlockFairlySlow):
(WTF::LockBase::unlockSlowImpl): Deleted.
* wtf/Lock.h:
(WTF::LockBase::lock):
(WTF::LockBase::tryLock):
(WTF::LockBase::unlock):
(WTF::LockBase::unlockFairly):
(WTF::LockBase::isHeld):
(): Deleted.
* wtf/LockAlgorithm.h: Added.
(WTF::LockAlgorithm::lockFastAssumingZero):
(WTF::LockAlgorithm::lockFast):
(WTF::LockAlgorithm::lock):
(WTF::LockAlgorithm::tryLock):
(WTF::LockAlgorithm::unlockFastAssumingZero):
(WTF::LockAlgorithm::unlockFast):
(WTF::LockAlgorithm::unlock):
(WTF::LockAlgorithm::unlockFairly):
(WTF::LockAlgorithm::isLocked):
(WTF::LockAlgorithm::lockSlow):
(WTF::LockAlgorithm::unlockSlow):
* wtf/TimeWithDynamicClockType.cpp:
(WTF::hasElapsed):
* wtf/TimeWithDynamicClockType.h:
Canonical link: https://commits.webkit.org/182434@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@208720 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-11-15 01:49:22 +00:00
|
|
|
return DefaultLockAlgorithm::tryLock(m_byte);
|
2015-08-15 00:14:52 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// Need this version for std::unique_lock.
|
2021-05-30 20:35:59 +00:00
|
|
|
bool try_lock() WTF_ACQUIRES_LOCK_IF(true)
|
2015-08-15 00:14:52 +00:00
|
|
|
{
|
|
|
|
return tryLock();
|
|
|
|
}
|
|
|
|
|
2021-05-30 20:35:59 +00:00
|
|
|
WTF_EXPORT_PRIVATE bool tryLockWithTimeout(Seconds timeout) WTF_ACQUIRES_LOCK_IF(true);
|
|
|
|
|
WTF::Lock should be fair eventually
https://bugs.webkit.org/show_bug.cgi?id=159384
Reviewed by Geoffrey Garen.
Source/WTF:
In https://webkit.org/blog/6161/locking-in-webkit/ we showed how relaxing the fairness of
locks makes them fast. That post presented lock fairness as a trade-off between two
extremes:
- Barging. A barging lock, like WTF::Lock, releases the lock in unlock() even if there was a
thread on the queue. If there was a thread on the queue, the lock is released and that
thread is made runnable. That thread may then grab the lock, or some other thread may grab
the lock first (it may barge). Usually, the barging thread is the thread that released the
lock in the first place. This maximizes throughput but hurts fairness. There is no good
theoretical bound on how unfair the lock may become, but empirical data suggests that it's
fair enough for the cases we previously measured.
- FIFO. A FIFO lock, like HandoffLock in ToyLocks.h, does not release the lock in unlock()
if there is a thread waiting. If there is a thread waiting, unlock() will make that thread
runnable and inform it that it now holds the lock. This ensures perfect round-robin
fairness and allows us to reason theoretically about how long it may take for a thread to
grab the lock. For example, if we know that only N threads are running and each one may
contend on a critical section, and each one may hold the lock for at most S seconds, then
the time it takes to grab the lock is N * S. Unfortunately, FIFO locks perform very badly
in most cases. This is because for the common case of short critical sections, they force
a context switch after each critical section if the lock is contended.
This change makes WTF::Lock almost as fair as FIFO while still being as fast as barging.
Thanks to this new algorithm, you can now have both of these things at the same time.
This change makes WTF::Lock eventually fair. We can almost (more on the caveats below)
guarantee that the time it takes to grab a lock is N * max(1ms, S). In other words, critical
sections that are longer than 1ms are always fair. For shorter critical sections, the amount
of time that any thread waits is 1ms times the number of threads. There are some caveats
that arise from our use of randomness, but even then, in the limit as the critical section
length goes to infinity, the lock becomes fair. The corner cases are unlikely to happen; our
experiments show that the lock becomes exactly as fair as a FIFO lock for any critical
section that is 1ms or longer.
The fairness mechanism is broken into two parts. WTF::Lock can now choose to unlock a lock
fairly or unfairly thanks to the new ParkingLot token mechanism. WTF::Lock knows when to use
fair unlocking based on a timeout mechanism in ParkingLot called timeToBeFair.
ParkingLot::unparkOne() and ParkingLot::parkConditionally() can now communicate with each
other via a token. unparkOne() can pass a token, which parkConditionally() will return. This
change also makes parkConditionally() a lot more precise about when it was unparked due to a
call to unparkOne(). If unparkOne() is told that a thread was unparked then this thread is
guaranteed to report that it was unparked rather than timing out, and that thread is
guaranteed to get the token that unparkOne() passed. The token is an intptr_t. We use it as
a boolean variable in WTF::Lock, but you could use it to pass arbitrary data structures. By
default, the token is zero. WTF::Lock's unlock() will pass 1 as the token if it is doing
fair unlocking. In that case, unlock() will not release the lock, and lock() will know that
it holds the lock as soon as parkConditionally() returns. Note that this algorithm relies
on unparkOne() invoking WTF::Lock's callback while the queue lock is held, so that WTF::Lock
can make a decision about unlock strategy and inject a token while it has complete knowledge
over the state of the queue. As such, it's not immediately obvious how to implement this
algorithm on top of futexes. You really need ParkingLot!
WTF::Lock does not use fair unlocking every time. We expose a new API, Lock::unlockFairly(),
which forces the fair unlocking behavior. Additionally, ParkingLot now maintains a
per-bucket stochastic fairness timeout. When the timeout fires, the unparkOne() callback
sees UnparkResult::timeToBeFair = true. This timeout is set to be anywhere from 0ms to 1ms
at random. When a dequeue happens and there are threads that actually get dequeued, we check
if the time since the last unfair unlock (the last time timeToBeFair was set to true) is
more than the timeout amount. If so, then we set timeToBeFair to true and reset the timeout.
This means that in the absence of ParkingLot collisions, unfair unlocking is guaranteed to
happen at least once per millisecond. It will happen at 2 KHz on average. If there are
collisions, then each collision adds one millisecond to the worst case (and 0.5 ms to the
average case). The reason why we don't just use a fixed 1ms timeout is that we want to avoid
resonance. Imagine a program in which some thread acquires a lock at 1 KHz in-phase with the
timeToBeFair timeout. Then this thread would be the benefactor of fairness to the detriment
of everyone else. Randomness ensures that we aren't too fair to any one thread.
Empirically, this is neutral on our major benchmarks like JetStream but it's an enormous
improvement in LockFairnessTest. It's common for an unfair lock (either our BargingLock, the
old WTF::Lock, any of the other futex-based locks that barge, or new os_unfair_lock) to
allow only one thread to hold the lock during a whole second in which each thread is holding
the lock for 1ms at a time. This is because in a barging lock, releasing a lock after
holding it for 1ms and then reacquiring it immediately virtually ensures that none of the
other threads can wake up in time to grab it before it's relocked. But the new WTF::Lock
handles this case like a champ: each thread gets equal turns.
Here's some data. If we launch 10 threads and have each of them run for 1 second while
repeatedly holding a critical section for 1ms, then here's how many times each thread gets
to hold the lock using the old WTF::Lock algorithm:
799, 6, 1, 1, 1, 1, 1, 1, 1, 1
One thread hogged the lock for almost the whole time! With the new WTF::Lock, the lock
becomes totally fair:
80, 79, 79, 79, 79, 79, 79, 80, 80, 79
I don't know of anyone creating such an automatically-fair adaptive lock before, so I think
that this is a pretty awesome advancement to the state of the art!
This change is good for three reasons:
- We do have long critical sections in WebKit and we don't want to have to worry about
starvation. This reduces the likelihood that we will see starvation due to our lock
strategy.
- I was talking to ggaren about bmalloc's locking needs, and he wanted unlockFairly() or
lockFairly() or some moral equivalent for the scavenger thread.
- If we use a WTF::Lock to manage heap access in a multithreaded GC, we'll need the ability
to unlock and relock without barging.
* benchmarks/LockFairnessTest.cpp:
(main):
* benchmarks/ToyLocks.h:
* wtf/Condition.h:
(WTF::ConditionBase::waitUntil):
(WTF::ConditionBase::notifyOne):
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
(WTF::LockBase::unlockFairlySlow):
(WTF::LockBase::unlockSlowImpl):
* wtf/Lock.h:
(WTF::LockBase::try_lock):
(WTF::LockBase::unlock):
(WTF::LockBase::unlockFairly):
(WTF::LockBase::isHeld):
(WTF::LockBase::isFullyReset):
* wtf/ParkingLot.cpp:
(WTF::ParkingLot::parkConditionallyImpl):
(WTF::ParkingLot::unparkOne):
(WTF::ParkingLot::unparkOneImpl):
(WTF::ParkingLot::unparkAll):
* wtf/ParkingLot.h:
(WTF::ParkingLot::parkConditionally):
(WTF::ParkingLot::compareAndPark):
(WTF::ParkingLot::unparkOne):
Tools:
* TestWebKitAPI/Tests/WTF/ParkingLot.cpp:
Canonical link: https://commits.webkit.org/178039@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@203350 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-07-18 18:32:52 +00:00
|
|
|
// Relinquish the lock. Either one of the threads that were waiting for the lock, or some other
|
|
|
|
// thread that happens to be running, will be able to grab the lock. This bit of unfairness is
|
|
|
|
// called barging, and we allow it because it maximizes throughput. However, we bound how unfair
|
|
|
|
// barging can get by ensuring that every once in a while, when there is a thread waiting on the
|
|
|
|
// lock, we hand the lock to that thread directly. Every time unlock() finds a thread waiting,
|
|
|
|
// we check if the last time that we did a fair unlock was more than roughly 1ms ago; if so, we
|
|
|
|
// unlock fairly. Fairness matters most for long critical sections, and this virtually
|
|
|
|
// guarantees that long critical sections always get a fair lock.
|
2021-05-30 20:35:59 +00:00
|
|
|
void unlock() WTF_RELEASES_LOCK()
|
Lightweight locks should be adaptive
https://bugs.webkit.org/show_bug.cgi?id=147545
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
* dfg/DFGCommon.cpp:
(JSC::DFG::startCrashing):
* heap/CopiedBlock.h:
(JSC::CopiedBlock::workListLock):
* heap/CopiedBlockInlines.h:
(JSC::CopiedBlock::shouldReportLiveBytes):
(JSC::CopiedBlock::reportLiveBytes):
* heap/CopiedSpace.cpp:
(JSC::CopiedSpace::doneFillingBlock):
* heap/CopiedSpace.h:
(JSC::CopiedSpace::CopiedGeneration::CopiedGeneration):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::recycleEvacuatedBlock):
* heap/GCThreadSharedData.cpp:
(JSC::GCThreadSharedData::didStartCopying):
* heap/GCThreadSharedData.h:
(JSC::GCThreadSharedData::getNextBlocksToCopy):
* heap/ListableHandler.h:
(JSC::ListableHandler::List::addThreadSafe):
(JSC::ListableHandler::List::addNotThreadSafe):
* heap/MachineStackMarker.cpp:
(JSC::MachineThreads::tryCopyOtherThreadStacks):
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::copyLater):
* parser/SourceProvider.cpp:
(JSC::SourceProvider::~SourceProvider):
(JSC::SourceProvider::getID):
* profiler/ProfilerDatabase.cpp:
(JSC::Profiler::Database::addDatabaseToAtExit):
(JSC::Profiler::Database::removeDatabaseFromAtExit):
(JSC::Profiler::Database::removeFirstAtExitDatabase):
* runtime/TypeProfilerLog.h:
Source/WebCore:
* bindings/objc/WebScriptObject.mm:
(WebCore::getJSWrapper):
(WebCore::addJSWrapper):
(WebCore::removeJSWrapper):
(WebCore::removeJSWrapperIfRetainCountOne):
* platform/audio/mac/CARingBuffer.cpp:
(WebCore::CARingBuffer::setCurrentFrameBounds):
(WebCore::CARingBuffer::getCurrentFrameBounds):
* platform/audio/mac/CARingBuffer.h:
* platform/ios/wak/WAKWindow.mm:
(-[WAKWindow setExposedScrollViewRect:]):
(-[WAKWindow exposedScrollViewRect]):
Source/WebKit2:
* WebProcess/WebPage/EventDispatcher.cpp:
(WebKit::EventDispatcher::clearQueuedTouchEventsForPage):
(WebKit::EventDispatcher::getQueuedTouchEventsForPage):
(WebKit::EventDispatcher::touchEvent):
(WebKit::EventDispatcher::dispatchTouchEvents):
* WebProcess/WebPage/EventDispatcher.h:
* WebProcess/WebPage/ViewUpdateDispatcher.cpp:
(WebKit::ViewUpdateDispatcher::visibleContentRectUpdate):
(WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate):
* WebProcess/WebPage/ViewUpdateDispatcher.h:
Source/WTF:
A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition
overhead is lower than system locks and because they take dramatically less space than system
locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock
acquire is up to 10x faster and under microcontention - short critical section with two or
more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4
bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit
should continue to avoid system locks - they are just far too slow and far too big.
But there is a problem with this idiom. System lock implementations will sleep a thread when
it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU.
In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for
microcontention, but awful when the lock will not be released for a while. In fact, when
critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is
almost 100x more than the CPU time cost of a system lock. This case doesn't arise too
frequently in our current uses of spinlocks, but that's probably because right now there are
places where we make a conscious decision to use system locks - even though they use more
memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a
while to acquire the lock.
The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new
concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks
that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The
idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held,
the slow path tries some number of spins to acquire the lock, and if that fails, the thread is
put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and
the lock itself is a tagged pointer: either it is just bits telling us the complete lock state
(not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the
lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path
if the lock is not contended, a short burst of spinning for microcontention, and a full-blown
queue for critical sections that are held for a long time.
On a locking microbenchmark, this new Lock exhibits the following performance
characteristics:
- Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster
than a system mutex.
- Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster
than a system mutex.
- CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a
SpinLock.
- Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than
a SpinLock.
This patch replaces all uses of SpinLock with Lock, since our critical sections are not
no-ops so if you do basically anything in your critical section, the Lock overhead will be
invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead
instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is
as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time.
This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits
of having a lock that just uses a byte are still better than the CPU wastage benefits of
Lock. But, this work will enable some future work to create locks that will fit in just 1.6
bits: https://bugs.webkit.org/show_bug.cgi?id=147665.
Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak
CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if
there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the
possibility of a null queue head.
[1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf
* WTF.vcxproj/WTF.vcxproj:
* WTF.xcodeproj/project.pbxproj:
* benchmarks: Added.
* benchmarks/LockSpeedTest.cpp: Added.
(main):
* wtf/Atomics.h:
(WTF::Atomic::compareExchangeWeak):
(WTF::Atomic::compareExchangeStrong):
* wtf/CMakeLists.txt:
* wtf/Lock.cpp: Added.
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
* wtf/Lock.h: Added.
(WTF::LockBase::lock):
(WTF::LockBase::unlock):
(WTF::LockBase::isHeld):
(WTF::LockBase::isLocked):
(WTF::Lock::Lock):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocatorHandle::shrink):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
(WTF::MetaAllocator::addFreshFreeSpace):
(WTF::MetaAllocator::debugFreeSpaceSize):
* wtf/MetaAllocator.h:
* wtf/SpinLock.h:
* wtf/ThreadingPthreads.cpp:
* wtf/ThreadingWin.cpp:
* wtf/text/AtomicString.cpp:
* wtf/text/AtomicStringImpl.cpp:
(WTF::AtomicStringTableLocker::AtomicStringTableLocker):
Tools:
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/Lock.cpp: Added.
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/165908@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
|
|
|
{
|
The GC should be optionally concurrent and disabled by default
https://bugs.webkit.org/show_bug.cgi?id=164454
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
This started out as a patch to have the GC scan the stack at the end, and then the
outage happened and I decided to pick a more aggresive target: give the GC a concurrent
mode that can be enabled at runtime, and whose only effect is that it turns on the
ResumeTheWorldScope. This gives our GC a really intuitive workflow: by default, the GC
thread is running solo with the world stopped and the parallel markers converged and
waiting. We have a parallel work scope to enable the parallel markers and now we have a
ResumeTheWorldScope that will optionally resume the world and then stop it again.
It's easy to make a concurrent GC that always instantly crashes. I can't promise that
this one won't do that when you run it. I set a specific goal: I wanted to do >10
concurrent GCs in debug mode with generations, optimizing JITs, and parallel marking
disabled.
To reach this milestone, I needed to do a bunch of stuff:
- The mutator needs a separate mark stack for the barrier, since it will mutate this
stack concurrently to the collector's slot visitors.
- The use of CellState to indicate whether an object is being scanned the first time or
a subsequent time was racy. It fails spectacularly when a barrier is fired at the same
time as visitChildren is running or if the barrier runs at the same time as the GC
marks the same object. So, I split SlotVisitor's mark stacks. It's now the case that
you know why you're being scanned by looking at which stack you came off of.
- All of root marking must be in the collector fixpoint. I renamed markRoots to
markToFixpoint. They say concurrency is hard, but the collector looks more intuitive
this way. We never gained anything from forcing people to make a choice between
scanning something in the fixpoint versus outside of it. Because root scanning is
cheap, we can afford to do it repeatedly, which means all root scanning can now do
constraint-based marking (like: I'll mark you if that thing is marked).
- JSObject::visitChildren's scanning of the butterfly raced with property additions,
indexed storage transitions and resizing, and a bunch of miscellaneous dirty butterfly
reshaping functions - like the one that flattens a dictionary and some sneaky
ArrayStorage transformations. Many of these can be fixed by using store-store fences
in the mutator and load-load fences in the collector. I've adopted the rule that the
collector must always see either a butterfly and structure that match or a newer
butterfly with an older structure, where their age is just one transition apart. This
can be achieved with fences. For the cases where it breaks down, I added a lock to
every JSCell. This is a full-fledged WTF lock that we sneak into two available bits in
the indexingType. See the WTF ChangeLog for details.
The mutator fencing rules are as follows:
- Store-store fence before and after setting the butterfly.
- Store-store fence before setting structure if you had changed the shape of the
butterfly.
- Store-store fence after initializing all fields in an allocation.
- A dictionary Structure can change in strange ways while the GC is trying to scan it.
So, JSObject::visitChildren will now grab the object's structure's lock if the
object's structure is a dictionary. Dictionary structures are 1:1 with their object,
so this does not reduce GC parallelism (super unlikely that the GC will simultaneously
scan an object from two threads).
- The GC can blow away a Structure's property table at any time. As a small consolation,
it's now holding the Structure's lock when it does so. But there was tons of code in
Structure that uses DeferGC to prevent the GC from blowing away the property table.
This doesn't work with concurrent GC, since DeferGC only means that the GC won't run
its safepoint (i.e. stop-the-world code) in the DeferGC region. It will still do
marking and it was the Structure::visitChildren that would delete the table. It turns
out that Structure's reliance on the property table not being deleted was the product
of code rot. We already had functions that would materialize the table on demand. We
were simply making the mistake of saying:
structure->materializePropertyMap();
...
structure->propertyTable()->things
Instead of saying:
PropertyTable* table = structure->ensurePropertyTable();
...
table->things
Switching the code to use the latter idiom allowed me to simplify the code a lot while
fixing the race.
- The LLInt's get_by_val handling was broken because the indexing shape constants were
wrong. Once I started putting more things into the IndexingType, that started causing
crashes for me. So I fixed LLInt. That turned out to be a lot of work, since that code
had rotted in subtle ways.
This is a speed-up in SunSpider, probably because of the LLInt fix. This is neutral on
Octane and Kraken. It's a smaller slow-down on LongSpider, but I think we can ignore
that (we don't view LongSpider as an official benchmark). By default, the concurrent GC
is disabled: in all of the places where it would have resumed the world to run marking
concurrently to the mutator, it will just skip the resume step. When you enable
concurrent GC (--useConcurrentGC=true), it can sometimes run Octane/splay to completion.
It seems to perform quite well: on my machine, it improves both splay-throughput and
splay-latency. It's probably unstable for other programs.
* API/JSVirtualMachine.mm:
(-[JSVirtualMachine isOldExternalObject:]):
* assembler/MacroAssemblerARMv7.h:
(JSC::MacroAssemblerARMv7::storeFence):
* bytecode/InlineAccess.cpp:
(JSC::InlineAccess::dumpCacheSizesAndCrash):
(JSC::InlineAccess::generateSelfPropertyAccess):
(JSC::InlineAccess::generateArrayLength):
* bytecode/ObjectAllocationProfile.h:
(JSC::ObjectAllocationProfile::offsetOfInlineCapacity):
(JSC::ObjectAllocationProfile::ObjectAllocationProfile):
(JSC::ObjectAllocationProfile::initialize):
(JSC::ObjectAllocationProfile::inlineCapacity):
(JSC::ObjectAllocationProfile::clear):
* bytecode/PolymorphicAccess.cpp:
(JSC::AccessCase::generateWithGuard):
(JSC::AccessCase::generateImpl):
* dfg/DFGArrayifySlowPathGenerator.h:
* dfg/DFGClobberize.h:
(JSC::DFG::clobberize):
* dfg/DFGOSRExitCompiler32_64.cpp:
(JSC::DFG::OSRExitCompiler::compileExit):
* dfg/DFGOSRExitCompiler64.cpp:
(JSC::DFG::OSRExitCompiler::compileExit):
* dfg/DFGOperations.cpp:
* dfg/DFGPlan.cpp:
(JSC::DFG::Plan::markCodeBlocks):
(JSC::DFG::Plan::rememberCodeBlocks):
* dfg/DFGPlan.h:
* dfg/DFGSpeculativeJIT.cpp:
(JSC::DFG::SpeculativeJIT::emitAllocateRawObject):
(JSC::DFG::SpeculativeJIT::checkArray):
(JSC::DFG::SpeculativeJIT::arrayify):
(JSC::DFG::SpeculativeJIT::compileMakeRope):
(JSC::DFG::SpeculativeJIT::compileNewFunctionCommon):
(JSC::DFG::SpeculativeJIT::compileCreateActivation):
(JSC::DFG::SpeculativeJIT::compileCreateDirectArguments):
(JSC::DFG::SpeculativeJIT::compileSpread):
(JSC::DFG::SpeculativeJIT::compileAllocatePropertyStorage):
(JSC::DFG::SpeculativeJIT::compileReallocatePropertyStorage):
(JSC::DFG::SpeculativeJIT::compileNewStringObject):
(JSC::DFG::SpeculativeJIT::compileNewTypedArray):
(JSC::DFG::SpeculativeJIT::compileStoreBarrier):
* dfg/DFGSpeculativeJIT64.cpp:
(JSC::DFG::SpeculativeJIT::compile):
(JSC::DFG::SpeculativeJIT::compileAllocateNewArrayWithSize):
* dfg/DFGTierUpCheckInjectionPhase.cpp:
(JSC::DFG::TierUpCheckInjectionPhase::run):
* dfg/DFGWorklist.cpp:
(JSC::DFG::Worklist::markCodeBlocks):
(JSC::DFG::Worklist::rememberCodeBlocks):
(JSC::DFG::markCodeBlocks):
(JSC::DFG::completeAllPlansForVM):
(JSC::DFG::rememberCodeBlocks):
* dfg/DFGWorklist.h:
* ftl/FTLAbstractHeapRepository.cpp:
(JSC::FTL::AbstractHeapRepository::AbstractHeapRepository):
(JSC::FTL::AbstractHeapRepository::computeRangesAndDecorateInstructions):
* ftl/FTLAbstractHeapRepository.h:
* ftl/FTLJITCode.cpp:
(JSC::FTL::JITCode::~JITCode):
* ftl/FTLLowerDFGToB3.cpp:
(JSC::FTL::DFG::LowerDFGToB3::compilePutStructure):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateActivation):
(JSC::FTL::DFG::LowerDFGToB3::compileNewFunction):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateDirectArguments):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateRest):
(JSC::FTL::DFG::LowerDFGToB3::compileNewObject):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArray):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSpread):
(JSC::FTL::DFG::LowerDFGToB3::compileSpread):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayBuffer):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSize):
(JSC::FTL::DFG::LowerDFGToB3::compileNewTypedArray):
(JSC::FTL::DFG::LowerDFGToB3::compileMakeRope):
(JSC::FTL::DFG::LowerDFGToB3::compileMultiPutByOffset):
(JSC::FTL::DFG::LowerDFGToB3::compileMaterializeNewObject):
(JSC::FTL::DFG::LowerDFGToB3::compileMaterializeCreateActivation):
(JSC::FTL::DFG::LowerDFGToB3::splatWords):
(JSC::FTL::DFG::LowerDFGToB3::allocatePropertyStorage):
(JSC::FTL::DFG::LowerDFGToB3::reallocatePropertyStorage):
(JSC::FTL::DFG::LowerDFGToB3::allocateObject):
(JSC::FTL::DFG::LowerDFGToB3::isArrayType):
(JSC::FTL::DFG::LowerDFGToB3::emitStoreBarrier):
(JSC::FTL::DFG::LowerDFGToB3::mutatorFence):
(JSC::FTL::DFG::LowerDFGToB3::setButterfly):
* ftl/FTLOSRExitCompiler.cpp:
(JSC::FTL::compileStub):
* ftl/FTLOutput.cpp:
(JSC::FTL::Output::signExt32ToPtr):
(JSC::FTL::Output::fence):
* ftl/FTLOutput.h:
* heap/CellState.h:
* heap/GCSegmentedArray.h:
* heap/Heap.cpp:
(JSC::Heap::ResumeTheWorldScope::ResumeTheWorldScope):
(JSC::Heap::ResumeTheWorldScope::~ResumeTheWorldScope):
(JSC::Heap::Heap):
(JSC::Heap::~Heap):
(JSC::Heap::harvestWeakReferences):
(JSC::Heap::finalizeUnconditionalFinalizers):
(JSC::Heap::completeAllJITPlans):
(JSC::Heap::markToFixpoint):
(JSC::Heap::gatherStackRoots):
(JSC::Heap::beginMarking):
(JSC::Heap::visitConservativeRoots):
(JSC::Heap::visitCompilerWorklistWeakReferences):
(JSC::Heap::updateObjectCounts):
(JSC::Heap::endMarking):
(JSC::Heap::addToRememberedSet):
(JSC::Heap::collectInThread):
(JSC::Heap::stopTheWorld):
(JSC::Heap::resumeTheWorld):
(JSC::Heap::setGCDidJIT):
(JSC::Heap::setNeedFinalize):
(JSC::Heap::setMutatorWaiting):
(JSC::Heap::clearMutatorWaiting):
(JSC::Heap::finalize):
(JSC::Heap::flushWriteBarrierBuffer):
(JSC::Heap::writeBarrierSlowPath):
(JSC::Heap::canCollect):
(JSC::Heap::reportExtraMemoryVisited):
(JSC::Heap::reportExternalMemoryVisited):
(JSC::Heap::notifyIsSafeToCollect):
(JSC::Heap::markRoots): Deleted.
(JSC::Heap::visitExternalRememberedSet): Deleted.
(JSC::Heap::visitSmallStrings): Deleted.
(JSC::Heap::visitProtectedObjects): Deleted.
(JSC::Heap::visitArgumentBuffers): Deleted.
(JSC::Heap::visitException): Deleted.
(JSC::Heap::visitStrongHandles): Deleted.
(JSC::Heap::visitHandleStack): Deleted.
(JSC::Heap::visitSamplingProfiler): Deleted.
(JSC::Heap::visitTypeProfiler): Deleted.
(JSC::Heap::visitShadowChicken): Deleted.
(JSC::Heap::traceCodeBlocksAndJITStubRoutines): Deleted.
(JSC::Heap::visitWeakHandles): Deleted.
(JSC::Heap::flushOldStructureIDTables): Deleted.
(JSC::Heap::stopAllocation): Deleted.
* heap/Heap.h:
(JSC::Heap::collectorSlotVisitor):
(JSC::Heap::mutatorMarkStack):
(JSC::Heap::mutatorShouldBeFenced):
(JSC::Heap::addressOfMutatorShouldBeFenced):
(JSC::Heap::slotVisitor): Deleted.
(JSC::Heap::notifyIsSafeToCollect): Deleted.
(JSC::Heap::barrierShouldBeFenced): Deleted.
(JSC::Heap::addressOfBarrierShouldBeFenced): Deleted.
* heap/MarkStack.cpp:
(JSC::MarkStackArray::transferTo):
* heap/MarkStack.h:
* heap/MarkedAllocator.cpp:
(JSC::MarkedAllocator::tryAllocateIn):
* heap/MarkedBlock.cpp:
(JSC::MarkedBlock::MarkedBlock):
(JSC::MarkedBlock::Handle::specializedSweep):
(JSC::MarkedBlock::Handle::sweep):
(JSC::MarkedBlock::Handle::sweepHelperSelectMarksMode):
(JSC::MarkedBlock::Handle::stopAllocating):
(JSC::MarkedBlock::Handle::resumeAllocating):
(JSC::MarkedBlock::aboutToMarkSlow):
(JSC::MarkedBlock::Handle::didConsumeFreeList):
(JSC::SetNewlyAllocatedFunctor::SetNewlyAllocatedFunctor): Deleted.
(JSC::SetNewlyAllocatedFunctor::operator()): Deleted.
* heap/MarkedBlock.h:
* heap/MarkedSpace.cpp:
(JSC::MarkedSpace::resumeAllocating):
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::SlotVisitor):
(JSC::SlotVisitor::~SlotVisitor):
(JSC::SlotVisitor::reset):
(JSC::SlotVisitor::clearMarkStacks):
(JSC::SlotVisitor::appendJSCellOrAuxiliary):
(JSC::SlotVisitor::setMarkedAndAppendToMarkStack):
(JSC::SlotVisitor::appendToMarkStack):
(JSC::SlotVisitor::appendToMutatorMarkStack):
(JSC::SlotVisitor::visitChildren):
(JSC::SlotVisitor::donateKnownParallel):
(JSC::SlotVisitor::drain):
(JSC::SlotVisitor::drainFromShared):
(JSC::SlotVisitor::containsOpaqueRoot):
(JSC::SlotVisitor::donateAndDrain):
(JSC::SlotVisitor::mergeOpaqueRoots):
(JSC::SlotVisitor::dump):
(JSC::SlotVisitor::clearMarkStack): Deleted.
(JSC::SlotVisitor::opaqueRootCount): Deleted.
* heap/SlotVisitor.h:
(JSC::SlotVisitor::collectorMarkStack):
(JSC::SlotVisitor::mutatorMarkStack):
(JSC::SlotVisitor::isEmpty):
(JSC::SlotVisitor::bytesVisited):
(JSC::SlotVisitor::markStack): Deleted.
(JSC::SlotVisitor::bytesCopied): Deleted.
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::reportExtraMemoryVisited):
(JSC::SlotVisitor::reportExternalMemoryVisited):
* jit/AssemblyHelpers.cpp:
(JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo):
* jit/AssemblyHelpers.h:
(JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo):
(JSC::AssemblyHelpers::barrierStoreLoadFence):
(JSC::AssemblyHelpers::mutatorFence):
(JSC::AssemblyHelpers::storeButterfly):
(JSC::AssemblyHelpers::jumpIfMutatorFenceNotNeeded):
(JSC::AssemblyHelpers::emitInitializeInlineStorage):
(JSC::AssemblyHelpers::emitInitializeOutOfLineStorage):
(JSC::AssemblyHelpers::jumpIfBarrierStoreLoadFenceNotNeeded): Deleted.
* jit/JITInlines.h:
(JSC::JIT::emitArrayProfilingSiteWithCell):
* jit/JITOperations.cpp:
* jit/JITPropertyAccess.cpp:
(JSC::JIT::emit_op_put_to_scope):
(JSC::JIT::emit_op_put_to_arguments):
* llint/LLIntData.cpp:
(JSC::LLInt::Data::performAssertions):
* llint/LowLevelInterpreter.asm:
* llint/LowLevelInterpreter64.asm:
* runtime/ButterflyInlines.h:
(JSC::Butterfly::create):
(JSC::Butterfly::createOrGrowPropertyStorage):
* runtime/ConcurrentJITLock.h:
(JSC::GCSafeConcurrentJITLocker::NoDefer::NoDefer): Deleted.
* runtime/GenericArgumentsInlines.h:
(JSC::GenericArguments<Type>::getOwnPropertySlotByIndex):
(JSC::GenericArguments<Type>::putByIndex):
* runtime/IndexingType.h:
* runtime/JSArray.cpp:
(JSC::JSArray::unshiftCountSlowCase):
(JSC::JSArray::unshiftCountWithArrayStorage):
* runtime/JSCell.h:
(JSC::JSCell::InternalLocker::InternalLocker):
(JSC::JSCell::InternalLocker::~InternalLocker):
(JSC::JSCell::atomicCompareExchangeCellStateWeakRelaxed):
(JSC::JSCell::atomicCompareExchangeCellStateStrong):
(JSC::JSCell::indexingTypeAndMiscOffset):
(JSC::JSCell::indexingTypeOffset): Deleted.
* runtime/JSCellInlines.h:
(JSC::JSCell::JSCell):
(JSC::JSCell::finishCreation):
(JSC::JSCell::indexingTypeAndMisc):
(JSC::JSCell::indexingType):
(JSC::JSCell::setStructure):
(JSC::JSCell::callDestructor):
(JSC::JSCell::lockInternalLock):
(JSC::JSCell::unlockInternalLock):
* runtime/JSObject.cpp:
(JSC::JSObject::visitButterfly):
(JSC::JSObject::visitChildren):
(JSC::JSFinalObject::visitChildren):
(JSC::JSObject::enterDictionaryIndexingModeWhenArrayStorageAlreadyExists):
(JSC::JSObject::createInitialUndecided):
(JSC::JSObject::createInitialInt32):
(JSC::JSObject::createInitialDouble):
(JSC::JSObject::createInitialContiguous):
(JSC::JSObject::createArrayStorage):
(JSC::JSObject::convertUndecidedToArrayStorage):
(JSC::JSObject::convertInt32ToArrayStorage):
(JSC::JSObject::convertDoubleToArrayStorage):
(JSC::JSObject::convertContiguousToArrayStorage):
(JSC::JSObject::deleteProperty):
(JSC::JSObject::defineOwnIndexedProperty):
(JSC::JSObject::increaseVectorLength):
(JSC::JSObject::ensureLengthSlow):
(JSC::JSObject::reallocateAndShrinkButterfly):
(JSC::JSObject::allocateMoreOutOfLineStorage):
(JSC::JSObject::shiftButterflyAfterFlattening):
(JSC::JSObject::growOutOfLineStorage): Deleted.
* runtime/JSObject.h:
(JSC::JSFinalObject::JSFinalObject):
(JSC::JSObject::setButterfly):
(JSC::JSObject::getOwnNonIndexPropertySlot):
(JSC::JSObject::fillCustomGetterPropertySlot):
(JSC::JSObject::getOwnPropertySlot):
(JSC::JSObject::getPropertySlot):
(JSC::JSObject::setStructureAndButterfly): Deleted.
(JSC::JSObject::setButterflyWithoutChangingStructure): Deleted.
(JSC::JSObject::putDirectInternal): Deleted.
(JSC::JSObject::putDirectWithoutTransition): Deleted.
* runtime/JSObjectInlines.h:
(JSC::JSObject::getPropertySlot):
(JSC::JSObject::getNonIndexPropertySlot):
(JSC::JSObject::putDirectWithoutTransition):
(JSC::JSObject::putDirectInternal):
* runtime/Options.h:
* runtime/SparseArrayValueMap.h:
* runtime/Structure.cpp:
(JSC::Structure::dumpStatistics):
(JSC::Structure::findStructuresAndMapForMaterialization):
(JSC::Structure::materializePropertyTable):
(JSC::Structure::addNewPropertyTransition):
(JSC::Structure::changePrototypeTransition):
(JSC::Structure::attributeChangeTransition):
(JSC::Structure::toDictionaryTransition):
(JSC::Structure::takePropertyTableOrCloneIfPinned):
(JSC::Structure::nonPropertyTransition):
(JSC::Structure::isSealed):
(JSC::Structure::isFrozen):
(JSC::Structure::flattenDictionaryStructure):
(JSC::Structure::pin):
(JSC::Structure::pinForCaching):
(JSC::Structure::willStoreValueSlow):
(JSC::Structure::copyPropertyTableForPinning):
(JSC::Structure::add):
(JSC::Structure::remove):
(JSC::Structure::getPropertyNamesFromStructure):
(JSC::Structure::visitChildren):
(JSC::Structure::materializePropertyMap): Deleted.
(JSC::Structure::addPropertyWithoutTransition): Deleted.
(JSC::Structure::removePropertyWithoutTransition): Deleted.
(JSC::Structure::copyPropertyTable): Deleted.
(JSC::Structure::createPropertyMap): Deleted.
(JSC::PropertyTable::checkConsistency): Deleted.
(JSC::Structure::checkConsistency): Deleted.
* runtime/Structure.h:
* runtime/StructureIDBlob.h:
(JSC::StructureIDBlob::StructureIDBlob):
(JSC::StructureIDBlob::indexingTypeIncludingHistory):
(JSC::StructureIDBlob::setIndexingTypeIncludingHistory):
(JSC::StructureIDBlob::indexingTypeIncludingHistoryOffset):
(JSC::StructureIDBlob::indexingType): Deleted.
(JSC::StructureIDBlob::setIndexingType): Deleted.
(JSC::StructureIDBlob::indexingTypeOffset): Deleted.
* runtime/StructureInlines.h:
(JSC::Structure::get):
(JSC::Structure::checkOffsetConsistency):
(JSC::Structure::checkConsistency):
(JSC::Structure::add):
(JSC::Structure::remove):
(JSC::Structure::addPropertyWithoutTransition):
(JSC::Structure::removePropertyWithoutTransition):
(JSC::Structure::setPropertyTable):
(JSC::Structure::putWillGrowOutOfLineStorage): Deleted.
(JSC::Structure::propertyTable): Deleted.
(JSC::Structure::suggestedNewOutOfLineStorageCapacity): Deleted.
Source/WTF:
The reason why I went to such great pains to make WTF::Lock fit in two bits is that I
knew that I would eventually need to stuff one into some miscellaneous bits of the
JSCell header. That time has come, because the concurrent GC has numerous race
conditions in visitChildren that can be trivially fixed if each object just has an
internal lock. Some cell types might use it to simply protect their entire visitChildren
function and anything that mutates the fields it touches, while other cell types might
use it as a "lock of last resort" to handle corner cases of an otherwise wait-free or
lock-free algorithm. Right now, it's used to protect certain transformations involving
indexing storage.
To make this happen, I factored the WTF::Lock algorithm into a LockAlgorithm struct that
is templatized on lock type (uint8_t for WTF::Lock), the isHeldBit value (1 for
WTF::Lock), and the hasParkedBit value (2 for WTF::Lock). This could have been done as
a templatized Lock class that basically contains Atomic<LockType>. You could then make
any field into a lock by bitwise_casting it to TemplateLock<field type, bit1, bit2>. But
this felt too dirty, so instead, LockAlgorithm has static methods that take
Atomic<LockType>& as their first argument. I think that this makes it more natural to
project a LockAlgorithm onto an existing Atomic<> field. Sadly, some places have to cast
their non-Atomic<> field to Atomic<> in order for this to work. Like so many other things
we do, this just shows that the C++ style of labeling fields that are subject to atomic
ops as atomic is counterproductive. Maybe some day I'll change LockAlgorithm to use our
other Atomics API, which does not require Atomic<>.
WTF::Lock now uses LockAlgorithm. The slow paths are still outlined. I don't feel too
bad about the LockAlgorithm.h header being included in so many places because we change
that algorithm so infrequently.
Also, I added a hasElapsed(time) function. This function makes it so much more natural
to write timeslicing code, which the concurrent GC has to do a lot of.
* WTF.xcodeproj/project.pbxproj:
* wtf/CMakeLists.txt:
* wtf/ListDump.h:
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
(WTF::LockBase::unlockFairlySlow):
(WTF::LockBase::unlockSlowImpl): Deleted.
* wtf/Lock.h:
(WTF::LockBase::lock):
(WTF::LockBase::tryLock):
(WTF::LockBase::unlock):
(WTF::LockBase::unlockFairly):
(WTF::LockBase::isHeld):
(): Deleted.
* wtf/LockAlgorithm.h: Added.
(WTF::LockAlgorithm::lockFastAssumingZero):
(WTF::LockAlgorithm::lockFast):
(WTF::LockAlgorithm::lock):
(WTF::LockAlgorithm::tryLock):
(WTF::LockAlgorithm::unlockFastAssumingZero):
(WTF::LockAlgorithm::unlockFast):
(WTF::LockAlgorithm::unlock):
(WTF::LockAlgorithm::unlockFairly):
(WTF::LockAlgorithm::isLocked):
(WTF::LockAlgorithm::lockSlow):
(WTF::LockAlgorithm::unlockSlow):
* wtf/TimeWithDynamicClockType.cpp:
(WTF::hasElapsed):
* wtf/TimeWithDynamicClockType.h:
Canonical link: https://commits.webkit.org/182434@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@208720 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-11-15 01:49:22 +00:00
|
|
|
if (UNLIKELY(!DefaultLockAlgorithm::unlockFastAssumingZero(m_byte)))
|
|
|
|
unlockSlow();
|
Lightweight locks should be adaptive
https://bugs.webkit.org/show_bug.cgi?id=147545
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
* dfg/DFGCommon.cpp:
(JSC::DFG::startCrashing):
* heap/CopiedBlock.h:
(JSC::CopiedBlock::workListLock):
* heap/CopiedBlockInlines.h:
(JSC::CopiedBlock::shouldReportLiveBytes):
(JSC::CopiedBlock::reportLiveBytes):
* heap/CopiedSpace.cpp:
(JSC::CopiedSpace::doneFillingBlock):
* heap/CopiedSpace.h:
(JSC::CopiedSpace::CopiedGeneration::CopiedGeneration):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::recycleEvacuatedBlock):
* heap/GCThreadSharedData.cpp:
(JSC::GCThreadSharedData::didStartCopying):
* heap/GCThreadSharedData.h:
(JSC::GCThreadSharedData::getNextBlocksToCopy):
* heap/ListableHandler.h:
(JSC::ListableHandler::List::addThreadSafe):
(JSC::ListableHandler::List::addNotThreadSafe):
* heap/MachineStackMarker.cpp:
(JSC::MachineThreads::tryCopyOtherThreadStacks):
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::copyLater):
* parser/SourceProvider.cpp:
(JSC::SourceProvider::~SourceProvider):
(JSC::SourceProvider::getID):
* profiler/ProfilerDatabase.cpp:
(JSC::Profiler::Database::addDatabaseToAtExit):
(JSC::Profiler::Database::removeDatabaseFromAtExit):
(JSC::Profiler::Database::removeFirstAtExitDatabase):
* runtime/TypeProfilerLog.h:
Source/WebCore:
* bindings/objc/WebScriptObject.mm:
(WebCore::getJSWrapper):
(WebCore::addJSWrapper):
(WebCore::removeJSWrapper):
(WebCore::removeJSWrapperIfRetainCountOne):
* platform/audio/mac/CARingBuffer.cpp:
(WebCore::CARingBuffer::setCurrentFrameBounds):
(WebCore::CARingBuffer::getCurrentFrameBounds):
* platform/audio/mac/CARingBuffer.h:
* platform/ios/wak/WAKWindow.mm:
(-[WAKWindow setExposedScrollViewRect:]):
(-[WAKWindow exposedScrollViewRect]):
Source/WebKit2:
* WebProcess/WebPage/EventDispatcher.cpp:
(WebKit::EventDispatcher::clearQueuedTouchEventsForPage):
(WebKit::EventDispatcher::getQueuedTouchEventsForPage):
(WebKit::EventDispatcher::touchEvent):
(WebKit::EventDispatcher::dispatchTouchEvents):
* WebProcess/WebPage/EventDispatcher.h:
* WebProcess/WebPage/ViewUpdateDispatcher.cpp:
(WebKit::ViewUpdateDispatcher::visibleContentRectUpdate):
(WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate):
* WebProcess/WebPage/ViewUpdateDispatcher.h:
Source/WTF:
A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition
overhead is lower than system locks and because they take dramatically less space than system
locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock
acquire is up to 10x faster and under microcontention - short critical section with two or
more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4
bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit
should continue to avoid system locks - they are just far too slow and far too big.
But there is a problem with this idiom. System lock implementations will sleep a thread when
it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU.
In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for
microcontention, but awful when the lock will not be released for a while. In fact, when
critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is
almost 100x more than the CPU time cost of a system lock. This case doesn't arise too
frequently in our current uses of spinlocks, but that's probably because right now there are
places where we make a conscious decision to use system locks - even though they use more
memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a
while to acquire the lock.
The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new
concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks
that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The
idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held,
the slow path tries some number of spins to acquire the lock, and if that fails, the thread is
put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and
the lock itself is a tagged pointer: either it is just bits telling us the complete lock state
(not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the
lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path
if the lock is not contended, a short burst of spinning for microcontention, and a full-blown
queue for critical sections that are held for a long time.
On a locking microbenchmark, this new Lock exhibits the following performance
characteristics:
- Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster
than a system mutex.
- Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster
than a system mutex.
- CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a
SpinLock.
- Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than
a SpinLock.
This patch replaces all uses of SpinLock with Lock, since our critical sections are not
no-ops so if you do basically anything in your critical section, the Lock overhead will be
invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead
instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is
as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time.
This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits
of having a lock that just uses a byte are still better than the CPU wastage benefits of
Lock. But, this work will enable some future work to create locks that will fit in just 1.6
bits: https://bugs.webkit.org/show_bug.cgi?id=147665.
Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak
CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if
there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the
possibility of a null queue head.
[1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf
* WTF.vcxproj/WTF.vcxproj:
* WTF.xcodeproj/project.pbxproj:
* benchmarks: Added.
* benchmarks/LockSpeedTest.cpp: Added.
(main):
* wtf/Atomics.h:
(WTF::Atomic::compareExchangeWeak):
(WTF::Atomic::compareExchangeStrong):
* wtf/CMakeLists.txt:
* wtf/Lock.cpp: Added.
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
* wtf/Lock.h: Added.
(WTF::LockBase::lock):
(WTF::LockBase::unlock):
(WTF::LockBase::isHeld):
(WTF::LockBase::isLocked):
(WTF::Lock::Lock):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocatorHandle::shrink):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
(WTF::MetaAllocator::addFreshFreeSpace):
(WTF::MetaAllocator::debugFreeSpaceSize):
* wtf/MetaAllocator.h:
* wtf/SpinLock.h:
* wtf/ThreadingPthreads.cpp:
* wtf/ThreadingWin.cpp:
* wtf/text/AtomicString.cpp:
* wtf/text/AtomicStringImpl.cpp:
(WTF::AtomicStringTableLocker::AtomicStringTableLocker):
Tools:
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/Lock.cpp: Added.
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/165908@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
|
|
|
}
|
|
|
|
|
WTF::Lock should be fair eventually
https://bugs.webkit.org/show_bug.cgi?id=159384
Reviewed by Geoffrey Garen.
Source/WTF:
In https://webkit.org/blog/6161/locking-in-webkit/ we showed how relaxing the fairness of
locks makes them fast. That post presented lock fairness as a trade-off between two
extremes:
- Barging. A barging lock, like WTF::Lock, releases the lock in unlock() even if there was a
thread on the queue. If there was a thread on the queue, the lock is released and that
thread is made runnable. That thread may then grab the lock, or some other thread may grab
the lock first (it may barge). Usually, the barging thread is the thread that released the
lock in the first place. This maximizes throughput but hurts fairness. There is no good
theoretical bound on how unfair the lock may become, but empirical data suggests that it's
fair enough for the cases we previously measured.
- FIFO. A FIFO lock, like HandoffLock in ToyLocks.h, does not release the lock in unlock()
if there is a thread waiting. If there is a thread waiting, unlock() will make that thread
runnable and inform it that it now holds the lock. This ensures perfect round-robin
fairness and allows us to reason theoretically about how long it may take for a thread to
grab the lock. For example, if we know that only N threads are running and each one may
contend on a critical section, and each one may hold the lock for at most S seconds, then
the time it takes to grab the lock is N * S. Unfortunately, FIFO locks perform very badly
in most cases. This is because for the common case of short critical sections, they force
a context switch after each critical section if the lock is contended.
This change makes WTF::Lock almost as fair as FIFO while still being as fast as barging.
Thanks to this new algorithm, you can now have both of these things at the same time.
This change makes WTF::Lock eventually fair. We can almost (more on the caveats below)
guarantee that the time it takes to grab a lock is N * max(1ms, S). In other words, critical
sections that are longer than 1ms are always fair. For shorter critical sections, the amount
of time that any thread waits is 1ms times the number of threads. There are some caveats
that arise from our use of randomness, but even then, in the limit as the critical section
length goes to infinity, the lock becomes fair. The corner cases are unlikely to happen; our
experiments show that the lock becomes exactly as fair as a FIFO lock for any critical
section that is 1ms or longer.
The fairness mechanism is broken into two parts. WTF::Lock can now choose to unlock a lock
fairly or unfairly thanks to the new ParkingLot token mechanism. WTF::Lock knows when to use
fair unlocking based on a timeout mechanism in ParkingLot called timeToBeFair.
ParkingLot::unparkOne() and ParkingLot::parkConditionally() can now communicate with each
other via a token. unparkOne() can pass a token, which parkConditionally() will return. This
change also makes parkConditionally() a lot more precise about when it was unparked due to a
call to unparkOne(). If unparkOne() is told that a thread was unparked then this thread is
guaranteed to report that it was unparked rather than timing out, and that thread is
guaranteed to get the token that unparkOne() passed. The token is an intptr_t. We use it as
a boolean variable in WTF::Lock, but you could use it to pass arbitrary data structures. By
default, the token is zero. WTF::Lock's unlock() will pass 1 as the token if it is doing
fair unlocking. In that case, unlock() will not release the lock, and lock() will know that
it holds the lock as soon as parkConditionally() returns. Note that this algorithm relies
on unparkOne() invoking WTF::Lock's callback while the queue lock is held, so that WTF::Lock
can make a decision about unlock strategy and inject a token while it has complete knowledge
over the state of the queue. As such, it's not immediately obvious how to implement this
algorithm on top of futexes. You really need ParkingLot!
WTF::Lock does not use fair unlocking every time. We expose a new API, Lock::unlockFairly(),
which forces the fair unlocking behavior. Additionally, ParkingLot now maintains a
per-bucket stochastic fairness timeout. When the timeout fires, the unparkOne() callback
sees UnparkResult::timeToBeFair = true. This timeout is set to be anywhere from 0ms to 1ms
at random. When a dequeue happens and there are threads that actually get dequeued, we check
if the time since the last unfair unlock (the last time timeToBeFair was set to true) is
more than the timeout amount. If so, then we set timeToBeFair to true and reset the timeout.
This means that in the absence of ParkingLot collisions, unfair unlocking is guaranteed to
happen at least once per millisecond. It will happen at 2 KHz on average. If there are
collisions, then each collision adds one millisecond to the worst case (and 0.5 ms to the
average case). The reason why we don't just use a fixed 1ms timeout is that we want to avoid
resonance. Imagine a program in which some thread acquires a lock at 1 KHz in-phase with the
timeToBeFair timeout. Then this thread would be the benefactor of fairness to the detriment
of everyone else. Randomness ensures that we aren't too fair to any one thread.
Empirically, this is neutral on our major benchmarks like JetStream but it's an enormous
improvement in LockFairnessTest. It's common for an unfair lock (either our BargingLock, the
old WTF::Lock, any of the other futex-based locks that barge, or new os_unfair_lock) to
allow only one thread to hold the lock during a whole second in which each thread is holding
the lock for 1ms at a time. This is because in a barging lock, releasing a lock after
holding it for 1ms and then reacquiring it immediately virtually ensures that none of the
other threads can wake up in time to grab it before it's relocked. But the new WTF::Lock
handles this case like a champ: each thread gets equal turns.
Here's some data. If we launch 10 threads and have each of them run for 1 second while
repeatedly holding a critical section for 1ms, then here's how many times each thread gets
to hold the lock using the old WTF::Lock algorithm:
799, 6, 1, 1, 1, 1, 1, 1, 1, 1
One thread hogged the lock for almost the whole time! With the new WTF::Lock, the lock
becomes totally fair:
80, 79, 79, 79, 79, 79, 79, 80, 80, 79
I don't know of anyone creating such an automatically-fair adaptive lock before, so I think
that this is a pretty awesome advancement to the state of the art!
This change is good for three reasons:
- We do have long critical sections in WebKit and we don't want to have to worry about
starvation. This reduces the likelihood that we will see starvation due to our lock
strategy.
- I was talking to ggaren about bmalloc's locking needs, and he wanted unlockFairly() or
lockFairly() or some moral equivalent for the scavenger thread.
- If we use a WTF::Lock to manage heap access in a multithreaded GC, we'll need the ability
to unlock and relock without barging.
* benchmarks/LockFairnessTest.cpp:
(main):
* benchmarks/ToyLocks.h:
* wtf/Condition.h:
(WTF::ConditionBase::waitUntil):
(WTF::ConditionBase::notifyOne):
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
(WTF::LockBase::unlockFairlySlow):
(WTF::LockBase::unlockSlowImpl):
* wtf/Lock.h:
(WTF::LockBase::try_lock):
(WTF::LockBase::unlock):
(WTF::LockBase::unlockFairly):
(WTF::LockBase::isHeld):
(WTF::LockBase::isFullyReset):
* wtf/ParkingLot.cpp:
(WTF::ParkingLot::parkConditionallyImpl):
(WTF::ParkingLot::unparkOne):
(WTF::ParkingLot::unparkOneImpl):
(WTF::ParkingLot::unparkAll):
* wtf/ParkingLot.h:
(WTF::ParkingLot::parkConditionally):
(WTF::ParkingLot::compareAndPark):
(WTF::ParkingLot::unparkOne):
Tools:
* TestWebKitAPI/Tests/WTF/ParkingLot.cpp:
Canonical link: https://commits.webkit.org/178039@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@203350 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-07-18 18:32:52 +00:00
|
|
|
// This is like unlock() but it guarantees that we unlock the lock fairly. For short critical
|
|
|
|
// sections, this is much slower than unlock(). For long critical sections, unlock() will learn
|
|
|
|
// to be fair anyway. However, if you plan to relock the lock right after unlocking and you want
|
|
|
|
// to ensure that some other thread runs in the meantime, this is probably the function you
|
|
|
|
// want.
|
2021-05-30 20:35:59 +00:00
|
|
|
void unlockFairly() WTF_RELEASES_LOCK()
|
WTF::Lock should be fair eventually
https://bugs.webkit.org/show_bug.cgi?id=159384
Reviewed by Geoffrey Garen.
Source/WTF:
In https://webkit.org/blog/6161/locking-in-webkit/ we showed how relaxing the fairness of
locks makes them fast. That post presented lock fairness as a trade-off between two
extremes:
- Barging. A barging lock, like WTF::Lock, releases the lock in unlock() even if there was a
thread on the queue. If there was a thread on the queue, the lock is released and that
thread is made runnable. That thread may then grab the lock, or some other thread may grab
the lock first (it may barge). Usually, the barging thread is the thread that released the
lock in the first place. This maximizes throughput but hurts fairness. There is no good
theoretical bound on how unfair the lock may become, but empirical data suggests that it's
fair enough for the cases we previously measured.
- FIFO. A FIFO lock, like HandoffLock in ToyLocks.h, does not release the lock in unlock()
if there is a thread waiting. If there is a thread waiting, unlock() will make that thread
runnable and inform it that it now holds the lock. This ensures perfect round-robin
fairness and allows us to reason theoretically about how long it may take for a thread to
grab the lock. For example, if we know that only N threads are running and each one may
contend on a critical section, and each one may hold the lock for at most S seconds, then
the time it takes to grab the lock is N * S. Unfortunately, FIFO locks perform very badly
in most cases. This is because for the common case of short critical sections, they force
a context switch after each critical section if the lock is contended.
This change makes WTF::Lock almost as fair as FIFO while still being as fast as barging.
Thanks to this new algorithm, you can now have both of these things at the same time.
This change makes WTF::Lock eventually fair. We can almost (more on the caveats below)
guarantee that the time it takes to grab a lock is N * max(1ms, S). In other words, critical
sections that are longer than 1ms are always fair. For shorter critical sections, the amount
of time that any thread waits is 1ms times the number of threads. There are some caveats
that arise from our use of randomness, but even then, in the limit as the critical section
length goes to infinity, the lock becomes fair. The corner cases are unlikely to happen; our
experiments show that the lock becomes exactly as fair as a FIFO lock for any critical
section that is 1ms or longer.
The fairness mechanism is broken into two parts. WTF::Lock can now choose to unlock a lock
fairly or unfairly thanks to the new ParkingLot token mechanism. WTF::Lock knows when to use
fair unlocking based on a timeout mechanism in ParkingLot called timeToBeFair.
ParkingLot::unparkOne() and ParkingLot::parkConditionally() can now communicate with each
other via a token. unparkOne() can pass a token, which parkConditionally() will return. This
change also makes parkConditionally() a lot more precise about when it was unparked due to a
call to unparkOne(). If unparkOne() is told that a thread was unparked then this thread is
guaranteed to report that it was unparked rather than timing out, and that thread is
guaranteed to get the token that unparkOne() passed. The token is an intptr_t. We use it as
a boolean variable in WTF::Lock, but you could use it to pass arbitrary data structures. By
default, the token is zero. WTF::Lock's unlock() will pass 1 as the token if it is doing
fair unlocking. In that case, unlock() will not release the lock, and lock() will know that
it holds the lock as soon as parkConditionally() returns. Note that this algorithm relies
on unparkOne() invoking WTF::Lock's callback while the queue lock is held, so that WTF::Lock
can make a decision about unlock strategy and inject a token while it has complete knowledge
over the state of the queue. As such, it's not immediately obvious how to implement this
algorithm on top of futexes. You really need ParkingLot!
WTF::Lock does not use fair unlocking every time. We expose a new API, Lock::unlockFairly(),
which forces the fair unlocking behavior. Additionally, ParkingLot now maintains a
per-bucket stochastic fairness timeout. When the timeout fires, the unparkOne() callback
sees UnparkResult::timeToBeFair = true. This timeout is set to be anywhere from 0ms to 1ms
at random. When a dequeue happens and there are threads that actually get dequeued, we check
if the time since the last unfair unlock (the last time timeToBeFair was set to true) is
more than the timeout amount. If so, then we set timeToBeFair to true and reset the timeout.
This means that in the absence of ParkingLot collisions, unfair unlocking is guaranteed to
happen at least once per millisecond. It will happen at 2 KHz on average. If there are
collisions, then each collision adds one millisecond to the worst case (and 0.5 ms to the
average case). The reason why we don't just use a fixed 1ms timeout is that we want to avoid
resonance. Imagine a program in which some thread acquires a lock at 1 KHz in-phase with the
timeToBeFair timeout. Then this thread would be the benefactor of fairness to the detriment
of everyone else. Randomness ensures that we aren't too fair to any one thread.
Empirically, this is neutral on our major benchmarks like JetStream but it's an enormous
improvement in LockFairnessTest. It's common for an unfair lock (either our BargingLock, the
old WTF::Lock, any of the other futex-based locks that barge, or new os_unfair_lock) to
allow only one thread to hold the lock during a whole second in which each thread is holding
the lock for 1ms at a time. This is because in a barging lock, releasing a lock after
holding it for 1ms and then reacquiring it immediately virtually ensures that none of the
other threads can wake up in time to grab it before it's relocked. But the new WTF::Lock
handles this case like a champ: each thread gets equal turns.
Here's some data. If we launch 10 threads and have each of them run for 1 second while
repeatedly holding a critical section for 1ms, then here's how many times each thread gets
to hold the lock using the old WTF::Lock algorithm:
799, 6, 1, 1, 1, 1, 1, 1, 1, 1
One thread hogged the lock for almost the whole time! With the new WTF::Lock, the lock
becomes totally fair:
80, 79, 79, 79, 79, 79, 79, 80, 80, 79
I don't know of anyone creating such an automatically-fair adaptive lock before, so I think
that this is a pretty awesome advancement to the state of the art!
This change is good for three reasons:
- We do have long critical sections in WebKit and we don't want to have to worry about
starvation. This reduces the likelihood that we will see starvation due to our lock
strategy.
- I was talking to ggaren about bmalloc's locking needs, and he wanted unlockFairly() or
lockFairly() or some moral equivalent for the scavenger thread.
- If we use a WTF::Lock to manage heap access in a multithreaded GC, we'll need the ability
to unlock and relock without barging.
* benchmarks/LockFairnessTest.cpp:
(main):
* benchmarks/ToyLocks.h:
* wtf/Condition.h:
(WTF::ConditionBase::waitUntil):
(WTF::ConditionBase::notifyOne):
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
(WTF::LockBase::unlockFairlySlow):
(WTF::LockBase::unlockSlowImpl):
* wtf/Lock.h:
(WTF::LockBase::try_lock):
(WTF::LockBase::unlock):
(WTF::LockBase::unlockFairly):
(WTF::LockBase::isHeld):
(WTF::LockBase::isFullyReset):
* wtf/ParkingLot.cpp:
(WTF::ParkingLot::parkConditionallyImpl):
(WTF::ParkingLot::unparkOne):
(WTF::ParkingLot::unparkOneImpl):
(WTF::ParkingLot::unparkAll):
* wtf/ParkingLot.h:
(WTF::ParkingLot::parkConditionally):
(WTF::ParkingLot::compareAndPark):
(WTF::ParkingLot::unparkOne):
Tools:
* TestWebKitAPI/Tests/WTF/ParkingLot.cpp:
Canonical link: https://commits.webkit.org/178039@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@203350 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-07-18 18:32:52 +00:00
|
|
|
{
|
The GC should be optionally concurrent and disabled by default
https://bugs.webkit.org/show_bug.cgi?id=164454
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
This started out as a patch to have the GC scan the stack at the end, and then the
outage happened and I decided to pick a more aggresive target: give the GC a concurrent
mode that can be enabled at runtime, and whose only effect is that it turns on the
ResumeTheWorldScope. This gives our GC a really intuitive workflow: by default, the GC
thread is running solo with the world stopped and the parallel markers converged and
waiting. We have a parallel work scope to enable the parallel markers and now we have a
ResumeTheWorldScope that will optionally resume the world and then stop it again.
It's easy to make a concurrent GC that always instantly crashes. I can't promise that
this one won't do that when you run it. I set a specific goal: I wanted to do >10
concurrent GCs in debug mode with generations, optimizing JITs, and parallel marking
disabled.
To reach this milestone, I needed to do a bunch of stuff:
- The mutator needs a separate mark stack for the barrier, since it will mutate this
stack concurrently to the collector's slot visitors.
- The use of CellState to indicate whether an object is being scanned the first time or
a subsequent time was racy. It fails spectacularly when a barrier is fired at the same
time as visitChildren is running or if the barrier runs at the same time as the GC
marks the same object. So, I split SlotVisitor's mark stacks. It's now the case that
you know why you're being scanned by looking at which stack you came off of.
- All of root marking must be in the collector fixpoint. I renamed markRoots to
markToFixpoint. They say concurrency is hard, but the collector looks more intuitive
this way. We never gained anything from forcing people to make a choice between
scanning something in the fixpoint versus outside of it. Because root scanning is
cheap, we can afford to do it repeatedly, which means all root scanning can now do
constraint-based marking (like: I'll mark you if that thing is marked).
- JSObject::visitChildren's scanning of the butterfly raced with property additions,
indexed storage transitions and resizing, and a bunch of miscellaneous dirty butterfly
reshaping functions - like the one that flattens a dictionary and some sneaky
ArrayStorage transformations. Many of these can be fixed by using store-store fences
in the mutator and load-load fences in the collector. I've adopted the rule that the
collector must always see either a butterfly and structure that match or a newer
butterfly with an older structure, where their age is just one transition apart. This
can be achieved with fences. For the cases where it breaks down, I added a lock to
every JSCell. This is a full-fledged WTF lock that we sneak into two available bits in
the indexingType. See the WTF ChangeLog for details.
The mutator fencing rules are as follows:
- Store-store fence before and after setting the butterfly.
- Store-store fence before setting structure if you had changed the shape of the
butterfly.
- Store-store fence after initializing all fields in an allocation.
- A dictionary Structure can change in strange ways while the GC is trying to scan it.
So, JSObject::visitChildren will now grab the object's structure's lock if the
object's structure is a dictionary. Dictionary structures are 1:1 with their object,
so this does not reduce GC parallelism (super unlikely that the GC will simultaneously
scan an object from two threads).
- The GC can blow away a Structure's property table at any time. As a small consolation,
it's now holding the Structure's lock when it does so. But there was tons of code in
Structure that uses DeferGC to prevent the GC from blowing away the property table.
This doesn't work with concurrent GC, since DeferGC only means that the GC won't run
its safepoint (i.e. stop-the-world code) in the DeferGC region. It will still do
marking and it was the Structure::visitChildren that would delete the table. It turns
out that Structure's reliance on the property table not being deleted was the product
of code rot. We already had functions that would materialize the table on demand. We
were simply making the mistake of saying:
structure->materializePropertyMap();
...
structure->propertyTable()->things
Instead of saying:
PropertyTable* table = structure->ensurePropertyTable();
...
table->things
Switching the code to use the latter idiom allowed me to simplify the code a lot while
fixing the race.
- The LLInt's get_by_val handling was broken because the indexing shape constants were
wrong. Once I started putting more things into the IndexingType, that started causing
crashes for me. So I fixed LLInt. That turned out to be a lot of work, since that code
had rotted in subtle ways.
This is a speed-up in SunSpider, probably because of the LLInt fix. This is neutral on
Octane and Kraken. It's a smaller slow-down on LongSpider, but I think we can ignore
that (we don't view LongSpider as an official benchmark). By default, the concurrent GC
is disabled: in all of the places where it would have resumed the world to run marking
concurrently to the mutator, it will just skip the resume step. When you enable
concurrent GC (--useConcurrentGC=true), it can sometimes run Octane/splay to completion.
It seems to perform quite well: on my machine, it improves both splay-throughput and
splay-latency. It's probably unstable for other programs.
* API/JSVirtualMachine.mm:
(-[JSVirtualMachine isOldExternalObject:]):
* assembler/MacroAssemblerARMv7.h:
(JSC::MacroAssemblerARMv7::storeFence):
* bytecode/InlineAccess.cpp:
(JSC::InlineAccess::dumpCacheSizesAndCrash):
(JSC::InlineAccess::generateSelfPropertyAccess):
(JSC::InlineAccess::generateArrayLength):
* bytecode/ObjectAllocationProfile.h:
(JSC::ObjectAllocationProfile::offsetOfInlineCapacity):
(JSC::ObjectAllocationProfile::ObjectAllocationProfile):
(JSC::ObjectAllocationProfile::initialize):
(JSC::ObjectAllocationProfile::inlineCapacity):
(JSC::ObjectAllocationProfile::clear):
* bytecode/PolymorphicAccess.cpp:
(JSC::AccessCase::generateWithGuard):
(JSC::AccessCase::generateImpl):
* dfg/DFGArrayifySlowPathGenerator.h:
* dfg/DFGClobberize.h:
(JSC::DFG::clobberize):
* dfg/DFGOSRExitCompiler32_64.cpp:
(JSC::DFG::OSRExitCompiler::compileExit):
* dfg/DFGOSRExitCompiler64.cpp:
(JSC::DFG::OSRExitCompiler::compileExit):
* dfg/DFGOperations.cpp:
* dfg/DFGPlan.cpp:
(JSC::DFG::Plan::markCodeBlocks):
(JSC::DFG::Plan::rememberCodeBlocks):
* dfg/DFGPlan.h:
* dfg/DFGSpeculativeJIT.cpp:
(JSC::DFG::SpeculativeJIT::emitAllocateRawObject):
(JSC::DFG::SpeculativeJIT::checkArray):
(JSC::DFG::SpeculativeJIT::arrayify):
(JSC::DFG::SpeculativeJIT::compileMakeRope):
(JSC::DFG::SpeculativeJIT::compileNewFunctionCommon):
(JSC::DFG::SpeculativeJIT::compileCreateActivation):
(JSC::DFG::SpeculativeJIT::compileCreateDirectArguments):
(JSC::DFG::SpeculativeJIT::compileSpread):
(JSC::DFG::SpeculativeJIT::compileAllocatePropertyStorage):
(JSC::DFG::SpeculativeJIT::compileReallocatePropertyStorage):
(JSC::DFG::SpeculativeJIT::compileNewStringObject):
(JSC::DFG::SpeculativeJIT::compileNewTypedArray):
(JSC::DFG::SpeculativeJIT::compileStoreBarrier):
* dfg/DFGSpeculativeJIT64.cpp:
(JSC::DFG::SpeculativeJIT::compile):
(JSC::DFG::SpeculativeJIT::compileAllocateNewArrayWithSize):
* dfg/DFGTierUpCheckInjectionPhase.cpp:
(JSC::DFG::TierUpCheckInjectionPhase::run):
* dfg/DFGWorklist.cpp:
(JSC::DFG::Worklist::markCodeBlocks):
(JSC::DFG::Worklist::rememberCodeBlocks):
(JSC::DFG::markCodeBlocks):
(JSC::DFG::completeAllPlansForVM):
(JSC::DFG::rememberCodeBlocks):
* dfg/DFGWorklist.h:
* ftl/FTLAbstractHeapRepository.cpp:
(JSC::FTL::AbstractHeapRepository::AbstractHeapRepository):
(JSC::FTL::AbstractHeapRepository::computeRangesAndDecorateInstructions):
* ftl/FTLAbstractHeapRepository.h:
* ftl/FTLJITCode.cpp:
(JSC::FTL::JITCode::~JITCode):
* ftl/FTLLowerDFGToB3.cpp:
(JSC::FTL::DFG::LowerDFGToB3::compilePutStructure):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateActivation):
(JSC::FTL::DFG::LowerDFGToB3::compileNewFunction):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateDirectArguments):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateRest):
(JSC::FTL::DFG::LowerDFGToB3::compileNewObject):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArray):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSpread):
(JSC::FTL::DFG::LowerDFGToB3::compileSpread):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayBuffer):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSize):
(JSC::FTL::DFG::LowerDFGToB3::compileNewTypedArray):
(JSC::FTL::DFG::LowerDFGToB3::compileMakeRope):
(JSC::FTL::DFG::LowerDFGToB3::compileMultiPutByOffset):
(JSC::FTL::DFG::LowerDFGToB3::compileMaterializeNewObject):
(JSC::FTL::DFG::LowerDFGToB3::compileMaterializeCreateActivation):
(JSC::FTL::DFG::LowerDFGToB3::splatWords):
(JSC::FTL::DFG::LowerDFGToB3::allocatePropertyStorage):
(JSC::FTL::DFG::LowerDFGToB3::reallocatePropertyStorage):
(JSC::FTL::DFG::LowerDFGToB3::allocateObject):
(JSC::FTL::DFG::LowerDFGToB3::isArrayType):
(JSC::FTL::DFG::LowerDFGToB3::emitStoreBarrier):
(JSC::FTL::DFG::LowerDFGToB3::mutatorFence):
(JSC::FTL::DFG::LowerDFGToB3::setButterfly):
* ftl/FTLOSRExitCompiler.cpp:
(JSC::FTL::compileStub):
* ftl/FTLOutput.cpp:
(JSC::FTL::Output::signExt32ToPtr):
(JSC::FTL::Output::fence):
* ftl/FTLOutput.h:
* heap/CellState.h:
* heap/GCSegmentedArray.h:
* heap/Heap.cpp:
(JSC::Heap::ResumeTheWorldScope::ResumeTheWorldScope):
(JSC::Heap::ResumeTheWorldScope::~ResumeTheWorldScope):
(JSC::Heap::Heap):
(JSC::Heap::~Heap):
(JSC::Heap::harvestWeakReferences):
(JSC::Heap::finalizeUnconditionalFinalizers):
(JSC::Heap::completeAllJITPlans):
(JSC::Heap::markToFixpoint):
(JSC::Heap::gatherStackRoots):
(JSC::Heap::beginMarking):
(JSC::Heap::visitConservativeRoots):
(JSC::Heap::visitCompilerWorklistWeakReferences):
(JSC::Heap::updateObjectCounts):
(JSC::Heap::endMarking):
(JSC::Heap::addToRememberedSet):
(JSC::Heap::collectInThread):
(JSC::Heap::stopTheWorld):
(JSC::Heap::resumeTheWorld):
(JSC::Heap::setGCDidJIT):
(JSC::Heap::setNeedFinalize):
(JSC::Heap::setMutatorWaiting):
(JSC::Heap::clearMutatorWaiting):
(JSC::Heap::finalize):
(JSC::Heap::flushWriteBarrierBuffer):
(JSC::Heap::writeBarrierSlowPath):
(JSC::Heap::canCollect):
(JSC::Heap::reportExtraMemoryVisited):
(JSC::Heap::reportExternalMemoryVisited):
(JSC::Heap::notifyIsSafeToCollect):
(JSC::Heap::markRoots): Deleted.
(JSC::Heap::visitExternalRememberedSet): Deleted.
(JSC::Heap::visitSmallStrings): Deleted.
(JSC::Heap::visitProtectedObjects): Deleted.
(JSC::Heap::visitArgumentBuffers): Deleted.
(JSC::Heap::visitException): Deleted.
(JSC::Heap::visitStrongHandles): Deleted.
(JSC::Heap::visitHandleStack): Deleted.
(JSC::Heap::visitSamplingProfiler): Deleted.
(JSC::Heap::visitTypeProfiler): Deleted.
(JSC::Heap::visitShadowChicken): Deleted.
(JSC::Heap::traceCodeBlocksAndJITStubRoutines): Deleted.
(JSC::Heap::visitWeakHandles): Deleted.
(JSC::Heap::flushOldStructureIDTables): Deleted.
(JSC::Heap::stopAllocation): Deleted.
* heap/Heap.h:
(JSC::Heap::collectorSlotVisitor):
(JSC::Heap::mutatorMarkStack):
(JSC::Heap::mutatorShouldBeFenced):
(JSC::Heap::addressOfMutatorShouldBeFenced):
(JSC::Heap::slotVisitor): Deleted.
(JSC::Heap::notifyIsSafeToCollect): Deleted.
(JSC::Heap::barrierShouldBeFenced): Deleted.
(JSC::Heap::addressOfBarrierShouldBeFenced): Deleted.
* heap/MarkStack.cpp:
(JSC::MarkStackArray::transferTo):
* heap/MarkStack.h:
* heap/MarkedAllocator.cpp:
(JSC::MarkedAllocator::tryAllocateIn):
* heap/MarkedBlock.cpp:
(JSC::MarkedBlock::MarkedBlock):
(JSC::MarkedBlock::Handle::specializedSweep):
(JSC::MarkedBlock::Handle::sweep):
(JSC::MarkedBlock::Handle::sweepHelperSelectMarksMode):
(JSC::MarkedBlock::Handle::stopAllocating):
(JSC::MarkedBlock::Handle::resumeAllocating):
(JSC::MarkedBlock::aboutToMarkSlow):
(JSC::MarkedBlock::Handle::didConsumeFreeList):
(JSC::SetNewlyAllocatedFunctor::SetNewlyAllocatedFunctor): Deleted.
(JSC::SetNewlyAllocatedFunctor::operator()): Deleted.
* heap/MarkedBlock.h:
* heap/MarkedSpace.cpp:
(JSC::MarkedSpace::resumeAllocating):
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::SlotVisitor):
(JSC::SlotVisitor::~SlotVisitor):
(JSC::SlotVisitor::reset):
(JSC::SlotVisitor::clearMarkStacks):
(JSC::SlotVisitor::appendJSCellOrAuxiliary):
(JSC::SlotVisitor::setMarkedAndAppendToMarkStack):
(JSC::SlotVisitor::appendToMarkStack):
(JSC::SlotVisitor::appendToMutatorMarkStack):
(JSC::SlotVisitor::visitChildren):
(JSC::SlotVisitor::donateKnownParallel):
(JSC::SlotVisitor::drain):
(JSC::SlotVisitor::drainFromShared):
(JSC::SlotVisitor::containsOpaqueRoot):
(JSC::SlotVisitor::donateAndDrain):
(JSC::SlotVisitor::mergeOpaqueRoots):
(JSC::SlotVisitor::dump):
(JSC::SlotVisitor::clearMarkStack): Deleted.
(JSC::SlotVisitor::opaqueRootCount): Deleted.
* heap/SlotVisitor.h:
(JSC::SlotVisitor::collectorMarkStack):
(JSC::SlotVisitor::mutatorMarkStack):
(JSC::SlotVisitor::isEmpty):
(JSC::SlotVisitor::bytesVisited):
(JSC::SlotVisitor::markStack): Deleted.
(JSC::SlotVisitor::bytesCopied): Deleted.
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::reportExtraMemoryVisited):
(JSC::SlotVisitor::reportExternalMemoryVisited):
* jit/AssemblyHelpers.cpp:
(JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo):
* jit/AssemblyHelpers.h:
(JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo):
(JSC::AssemblyHelpers::barrierStoreLoadFence):
(JSC::AssemblyHelpers::mutatorFence):
(JSC::AssemblyHelpers::storeButterfly):
(JSC::AssemblyHelpers::jumpIfMutatorFenceNotNeeded):
(JSC::AssemblyHelpers::emitInitializeInlineStorage):
(JSC::AssemblyHelpers::emitInitializeOutOfLineStorage):
(JSC::AssemblyHelpers::jumpIfBarrierStoreLoadFenceNotNeeded): Deleted.
* jit/JITInlines.h:
(JSC::JIT::emitArrayProfilingSiteWithCell):
* jit/JITOperations.cpp:
* jit/JITPropertyAccess.cpp:
(JSC::JIT::emit_op_put_to_scope):
(JSC::JIT::emit_op_put_to_arguments):
* llint/LLIntData.cpp:
(JSC::LLInt::Data::performAssertions):
* llint/LowLevelInterpreter.asm:
* llint/LowLevelInterpreter64.asm:
* runtime/ButterflyInlines.h:
(JSC::Butterfly::create):
(JSC::Butterfly::createOrGrowPropertyStorage):
* runtime/ConcurrentJITLock.h:
(JSC::GCSafeConcurrentJITLocker::NoDefer::NoDefer): Deleted.
* runtime/GenericArgumentsInlines.h:
(JSC::GenericArguments<Type>::getOwnPropertySlotByIndex):
(JSC::GenericArguments<Type>::putByIndex):
* runtime/IndexingType.h:
* runtime/JSArray.cpp:
(JSC::JSArray::unshiftCountSlowCase):
(JSC::JSArray::unshiftCountWithArrayStorage):
* runtime/JSCell.h:
(JSC::JSCell::InternalLocker::InternalLocker):
(JSC::JSCell::InternalLocker::~InternalLocker):
(JSC::JSCell::atomicCompareExchangeCellStateWeakRelaxed):
(JSC::JSCell::atomicCompareExchangeCellStateStrong):
(JSC::JSCell::indexingTypeAndMiscOffset):
(JSC::JSCell::indexingTypeOffset): Deleted.
* runtime/JSCellInlines.h:
(JSC::JSCell::JSCell):
(JSC::JSCell::finishCreation):
(JSC::JSCell::indexingTypeAndMisc):
(JSC::JSCell::indexingType):
(JSC::JSCell::setStructure):
(JSC::JSCell::callDestructor):
(JSC::JSCell::lockInternalLock):
(JSC::JSCell::unlockInternalLock):
* runtime/JSObject.cpp:
(JSC::JSObject::visitButterfly):
(JSC::JSObject::visitChildren):
(JSC::JSFinalObject::visitChildren):
(JSC::JSObject::enterDictionaryIndexingModeWhenArrayStorageAlreadyExists):
(JSC::JSObject::createInitialUndecided):
(JSC::JSObject::createInitialInt32):
(JSC::JSObject::createInitialDouble):
(JSC::JSObject::createInitialContiguous):
(JSC::JSObject::createArrayStorage):
(JSC::JSObject::convertUndecidedToArrayStorage):
(JSC::JSObject::convertInt32ToArrayStorage):
(JSC::JSObject::convertDoubleToArrayStorage):
(JSC::JSObject::convertContiguousToArrayStorage):
(JSC::JSObject::deleteProperty):
(JSC::JSObject::defineOwnIndexedProperty):
(JSC::JSObject::increaseVectorLength):
(JSC::JSObject::ensureLengthSlow):
(JSC::JSObject::reallocateAndShrinkButterfly):
(JSC::JSObject::allocateMoreOutOfLineStorage):
(JSC::JSObject::shiftButterflyAfterFlattening):
(JSC::JSObject::growOutOfLineStorage): Deleted.
* runtime/JSObject.h:
(JSC::JSFinalObject::JSFinalObject):
(JSC::JSObject::setButterfly):
(JSC::JSObject::getOwnNonIndexPropertySlot):
(JSC::JSObject::fillCustomGetterPropertySlot):
(JSC::JSObject::getOwnPropertySlot):
(JSC::JSObject::getPropertySlot):
(JSC::JSObject::setStructureAndButterfly): Deleted.
(JSC::JSObject::setButterflyWithoutChangingStructure): Deleted.
(JSC::JSObject::putDirectInternal): Deleted.
(JSC::JSObject::putDirectWithoutTransition): Deleted.
* runtime/JSObjectInlines.h:
(JSC::JSObject::getPropertySlot):
(JSC::JSObject::getNonIndexPropertySlot):
(JSC::JSObject::putDirectWithoutTransition):
(JSC::JSObject::putDirectInternal):
* runtime/Options.h:
* runtime/SparseArrayValueMap.h:
* runtime/Structure.cpp:
(JSC::Structure::dumpStatistics):
(JSC::Structure::findStructuresAndMapForMaterialization):
(JSC::Structure::materializePropertyTable):
(JSC::Structure::addNewPropertyTransition):
(JSC::Structure::changePrototypeTransition):
(JSC::Structure::attributeChangeTransition):
(JSC::Structure::toDictionaryTransition):
(JSC::Structure::takePropertyTableOrCloneIfPinned):
(JSC::Structure::nonPropertyTransition):
(JSC::Structure::isSealed):
(JSC::Structure::isFrozen):
(JSC::Structure::flattenDictionaryStructure):
(JSC::Structure::pin):
(JSC::Structure::pinForCaching):
(JSC::Structure::willStoreValueSlow):
(JSC::Structure::copyPropertyTableForPinning):
(JSC::Structure::add):
(JSC::Structure::remove):
(JSC::Structure::getPropertyNamesFromStructure):
(JSC::Structure::visitChildren):
(JSC::Structure::materializePropertyMap): Deleted.
(JSC::Structure::addPropertyWithoutTransition): Deleted.
(JSC::Structure::removePropertyWithoutTransition): Deleted.
(JSC::Structure::copyPropertyTable): Deleted.
(JSC::Structure::createPropertyMap): Deleted.
(JSC::PropertyTable::checkConsistency): Deleted.
(JSC::Structure::checkConsistency): Deleted.
* runtime/Structure.h:
* runtime/StructureIDBlob.h:
(JSC::StructureIDBlob::StructureIDBlob):
(JSC::StructureIDBlob::indexingTypeIncludingHistory):
(JSC::StructureIDBlob::setIndexingTypeIncludingHistory):
(JSC::StructureIDBlob::indexingTypeIncludingHistoryOffset):
(JSC::StructureIDBlob::indexingType): Deleted.
(JSC::StructureIDBlob::setIndexingType): Deleted.
(JSC::StructureIDBlob::indexingTypeOffset): Deleted.
* runtime/StructureInlines.h:
(JSC::Structure::get):
(JSC::Structure::checkOffsetConsistency):
(JSC::Structure::checkConsistency):
(JSC::Structure::add):
(JSC::Structure::remove):
(JSC::Structure::addPropertyWithoutTransition):
(JSC::Structure::removePropertyWithoutTransition):
(JSC::Structure::setPropertyTable):
(JSC::Structure::putWillGrowOutOfLineStorage): Deleted.
(JSC::Structure::propertyTable): Deleted.
(JSC::Structure::suggestedNewOutOfLineStorageCapacity): Deleted.
Source/WTF:
The reason why I went to such great pains to make WTF::Lock fit in two bits is that I
knew that I would eventually need to stuff one into some miscellaneous bits of the
JSCell header. That time has come, because the concurrent GC has numerous race
conditions in visitChildren that can be trivially fixed if each object just has an
internal lock. Some cell types might use it to simply protect their entire visitChildren
function and anything that mutates the fields it touches, while other cell types might
use it as a "lock of last resort" to handle corner cases of an otherwise wait-free or
lock-free algorithm. Right now, it's used to protect certain transformations involving
indexing storage.
To make this happen, I factored the WTF::Lock algorithm into a LockAlgorithm struct that
is templatized on lock type (uint8_t for WTF::Lock), the isHeldBit value (1 for
WTF::Lock), and the hasParkedBit value (2 for WTF::Lock). This could have been done as
a templatized Lock class that basically contains Atomic<LockType>. You could then make
any field into a lock by bitwise_casting it to TemplateLock<field type, bit1, bit2>. But
this felt too dirty, so instead, LockAlgorithm has static methods that take
Atomic<LockType>& as their first argument. I think that this makes it more natural to
project a LockAlgorithm onto an existing Atomic<> field. Sadly, some places have to cast
their non-Atomic<> field to Atomic<> in order for this to work. Like so many other things
we do, this just shows that the C++ style of labeling fields that are subject to atomic
ops as atomic is counterproductive. Maybe some day I'll change LockAlgorithm to use our
other Atomics API, which does not require Atomic<>.
WTF::Lock now uses LockAlgorithm. The slow paths are still outlined. I don't feel too
bad about the LockAlgorithm.h header being included in so many places because we change
that algorithm so infrequently.
Also, I added a hasElapsed(time) function. This function makes it so much more natural
to write timeslicing code, which the concurrent GC has to do a lot of.
* WTF.xcodeproj/project.pbxproj:
* wtf/CMakeLists.txt:
* wtf/ListDump.h:
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
(WTF::LockBase::unlockFairlySlow):
(WTF::LockBase::unlockSlowImpl): Deleted.
* wtf/Lock.h:
(WTF::LockBase::lock):
(WTF::LockBase::tryLock):
(WTF::LockBase::unlock):
(WTF::LockBase::unlockFairly):
(WTF::LockBase::isHeld):
(): Deleted.
* wtf/LockAlgorithm.h: Added.
(WTF::LockAlgorithm::lockFastAssumingZero):
(WTF::LockAlgorithm::lockFast):
(WTF::LockAlgorithm::lock):
(WTF::LockAlgorithm::tryLock):
(WTF::LockAlgorithm::unlockFastAssumingZero):
(WTF::LockAlgorithm::unlockFast):
(WTF::LockAlgorithm::unlock):
(WTF::LockAlgorithm::unlockFairly):
(WTF::LockAlgorithm::isLocked):
(WTF::LockAlgorithm::lockSlow):
(WTF::LockAlgorithm::unlockSlow):
* wtf/TimeWithDynamicClockType.cpp:
(WTF::hasElapsed):
* wtf/TimeWithDynamicClockType.h:
Canonical link: https://commits.webkit.org/182434@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@208720 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-11-15 01:49:22 +00:00
|
|
|
if (UNLIKELY(!DefaultLockAlgorithm::unlockFastAssumingZero(m_byte)))
|
|
|
|
unlockFairlySlow();
|
WTF::Lock should be fair eventually
https://bugs.webkit.org/show_bug.cgi?id=159384
Reviewed by Geoffrey Garen.
Source/WTF:
In https://webkit.org/blog/6161/locking-in-webkit/ we showed how relaxing the fairness of
locks makes them fast. That post presented lock fairness as a trade-off between two
extremes:
- Barging. A barging lock, like WTF::Lock, releases the lock in unlock() even if there was a
thread on the queue. If there was a thread on the queue, the lock is released and that
thread is made runnable. That thread may then grab the lock, or some other thread may grab
the lock first (it may barge). Usually, the barging thread is the thread that released the
lock in the first place. This maximizes throughput but hurts fairness. There is no good
theoretical bound on how unfair the lock may become, but empirical data suggests that it's
fair enough for the cases we previously measured.
- FIFO. A FIFO lock, like HandoffLock in ToyLocks.h, does not release the lock in unlock()
if there is a thread waiting. If there is a thread waiting, unlock() will make that thread
runnable and inform it that it now holds the lock. This ensures perfect round-robin
fairness and allows us to reason theoretically about how long it may take for a thread to
grab the lock. For example, if we know that only N threads are running and each one may
contend on a critical section, and each one may hold the lock for at most S seconds, then
the time it takes to grab the lock is N * S. Unfortunately, FIFO locks perform very badly
in most cases. This is because for the common case of short critical sections, they force
a context switch after each critical section if the lock is contended.
This change makes WTF::Lock almost as fair as FIFO while still being as fast as barging.
Thanks to this new algorithm, you can now have both of these things at the same time.
This change makes WTF::Lock eventually fair. We can almost (more on the caveats below)
guarantee that the time it takes to grab a lock is N * max(1ms, S). In other words, critical
sections that are longer than 1ms are always fair. For shorter critical sections, the amount
of time that any thread waits is 1ms times the number of threads. There are some caveats
that arise from our use of randomness, but even then, in the limit as the critical section
length goes to infinity, the lock becomes fair. The corner cases are unlikely to happen; our
experiments show that the lock becomes exactly as fair as a FIFO lock for any critical
section that is 1ms or longer.
The fairness mechanism is broken into two parts. WTF::Lock can now choose to unlock a lock
fairly or unfairly thanks to the new ParkingLot token mechanism. WTF::Lock knows when to use
fair unlocking based on a timeout mechanism in ParkingLot called timeToBeFair.
ParkingLot::unparkOne() and ParkingLot::parkConditionally() can now communicate with each
other via a token. unparkOne() can pass a token, which parkConditionally() will return. This
change also makes parkConditionally() a lot more precise about when it was unparked due to a
call to unparkOne(). If unparkOne() is told that a thread was unparked then this thread is
guaranteed to report that it was unparked rather than timing out, and that thread is
guaranteed to get the token that unparkOne() passed. The token is an intptr_t. We use it as
a boolean variable in WTF::Lock, but you could use it to pass arbitrary data structures. By
default, the token is zero. WTF::Lock's unlock() will pass 1 as the token if it is doing
fair unlocking. In that case, unlock() will not release the lock, and lock() will know that
it holds the lock as soon as parkConditionally() returns. Note that this algorithm relies
on unparkOne() invoking WTF::Lock's callback while the queue lock is held, so that WTF::Lock
can make a decision about unlock strategy and inject a token while it has complete knowledge
over the state of the queue. As such, it's not immediately obvious how to implement this
algorithm on top of futexes. You really need ParkingLot!
WTF::Lock does not use fair unlocking every time. We expose a new API, Lock::unlockFairly(),
which forces the fair unlocking behavior. Additionally, ParkingLot now maintains a
per-bucket stochastic fairness timeout. When the timeout fires, the unparkOne() callback
sees UnparkResult::timeToBeFair = true. This timeout is set to be anywhere from 0ms to 1ms
at random. When a dequeue happens and there are threads that actually get dequeued, we check
if the time since the last unfair unlock (the last time timeToBeFair was set to true) is
more than the timeout amount. If so, then we set timeToBeFair to true and reset the timeout.
This means that in the absence of ParkingLot collisions, unfair unlocking is guaranteed to
happen at least once per millisecond. It will happen at 2 KHz on average. If there are
collisions, then each collision adds one millisecond to the worst case (and 0.5 ms to the
average case). The reason why we don't just use a fixed 1ms timeout is that we want to avoid
resonance. Imagine a program in which some thread acquires a lock at 1 KHz in-phase with the
timeToBeFair timeout. Then this thread would be the benefactor of fairness to the detriment
of everyone else. Randomness ensures that we aren't too fair to any one thread.
Empirically, this is neutral on our major benchmarks like JetStream but it's an enormous
improvement in LockFairnessTest. It's common for an unfair lock (either our BargingLock, the
old WTF::Lock, any of the other futex-based locks that barge, or new os_unfair_lock) to
allow only one thread to hold the lock during a whole second in which each thread is holding
the lock for 1ms at a time. This is because in a barging lock, releasing a lock after
holding it for 1ms and then reacquiring it immediately virtually ensures that none of the
other threads can wake up in time to grab it before it's relocked. But the new WTF::Lock
handles this case like a champ: each thread gets equal turns.
Here's some data. If we launch 10 threads and have each of them run for 1 second while
repeatedly holding a critical section for 1ms, then here's how many times each thread gets
to hold the lock using the old WTF::Lock algorithm:
799, 6, 1, 1, 1, 1, 1, 1, 1, 1
One thread hogged the lock for almost the whole time! With the new WTF::Lock, the lock
becomes totally fair:
80, 79, 79, 79, 79, 79, 79, 80, 80, 79
I don't know of anyone creating such an automatically-fair adaptive lock before, so I think
that this is a pretty awesome advancement to the state of the art!
This change is good for three reasons:
- We do have long critical sections in WebKit and we don't want to have to worry about
starvation. This reduces the likelihood that we will see starvation due to our lock
strategy.
- I was talking to ggaren about bmalloc's locking needs, and he wanted unlockFairly() or
lockFairly() or some moral equivalent for the scavenger thread.
- If we use a WTF::Lock to manage heap access in a multithreaded GC, we'll need the ability
to unlock and relock without barging.
* benchmarks/LockFairnessTest.cpp:
(main):
* benchmarks/ToyLocks.h:
* wtf/Condition.h:
(WTF::ConditionBase::waitUntil):
(WTF::ConditionBase::notifyOne):
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
(WTF::LockBase::unlockFairlySlow):
(WTF::LockBase::unlockSlowImpl):
* wtf/Lock.h:
(WTF::LockBase::try_lock):
(WTF::LockBase::unlock):
(WTF::LockBase::unlockFairly):
(WTF::LockBase::isHeld):
(WTF::LockBase::isFullyReset):
* wtf/ParkingLot.cpp:
(WTF::ParkingLot::parkConditionallyImpl):
(WTF::ParkingLot::unparkOne):
(WTF::ParkingLot::unparkOneImpl):
(WTF::ParkingLot::unparkAll):
* wtf/ParkingLot.h:
(WTF::ParkingLot::parkConditionally):
(WTF::ParkingLot::compareAndPark):
(WTF::ParkingLot::unparkOne):
Tools:
* TestWebKitAPI/Tests/WTF/ParkingLot.cpp:
Canonical link: https://commits.webkit.org/178039@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@203350 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-07-18 18:32:52 +00:00
|
|
|
}
|
PerformanceTests:
Concurrent GC should be stable enough to land enabled
https://bugs.webkit.org/show_bug.cgi?id=164990
Reviewed by Geoffrey Garen.
Made CDjs more configurable and refined the "large.js" configuration. I was using that one and
the new "long.js" configuration to tune concurrent eden GCs.
Added a new way of running Splay in browser, which using chartjs to plot the execution times of
2000 iterations. This includes the minified chartjs.
* JetStream/Octane2/splay-detail.html: Added.
* JetStream/cdjs/benchmark.js:
(benchmarkImpl):
(benchmark):
* JetStream/cdjs/long.js: Added.
Source/JavaScriptCore:
Concurrent GC should be stable enough to land enabled on X86_64
https://bugs.webkit.org/show_bug.cgi?id=164990
Reviewed by Geoffrey Garen.
This fixes a ton of performance and correctness bugs revealed by getting the concurrent GC to
be stable enough to land enabled.
I had to redo the JSObject::visitChildren concurrency protocol again. This time I think it's
even more correct than ever!
This is an enormous win on JetStream/splay-latency and Octane/SplayLatency. It looks to be
mostly neutral on everything else, though Speedometer is showing statistically weak signs of a
slight regression.
* API/JSAPIWrapperObject.mm: Added locking.
(JSC::JSAPIWrapperObject::visitChildren):
* API/JSCallbackObject.h: Added locking.
(JSC::JSCallbackObjectData::visitChildren):
(JSC::JSCallbackObjectData::JSPrivatePropertyMap::setPrivateProperty):
(JSC::JSCallbackObjectData::JSPrivatePropertyMap::deletePrivateProperty):
(JSC::JSCallbackObjectData::JSPrivatePropertyMap::visitChildren):
* CMakeLists.txt:
* JavaScriptCore.xcodeproj/project.pbxproj:
* bytecode/CodeBlock.cpp:
(JSC::CodeBlock::UnconditionalFinalizer::finalizeUnconditionally): This had a TOCTOU race on shouldJettisonDueToOldAge.
(JSC::EvalCodeCache::visitAggregate): Moved to EvalCodeCache.cpp.
* bytecode/DirectEvalCodeCache.cpp: Added. Outlined some functions and made them use locks.
(JSC::DirectEvalCodeCache::setSlow):
(JSC::DirectEvalCodeCache::clear):
(JSC::DirectEvalCodeCache::visitAggregate):
* bytecode/DirectEvalCodeCache.h:
(JSC::DirectEvalCodeCache::set):
(JSC::DirectEvalCodeCache::clear): Deleted.
* bytecode/UnlinkedCodeBlock.cpp: Added locking.
(JSC::UnlinkedCodeBlock::visitChildren):
(JSC::UnlinkedCodeBlock::setInstructions):
(JSC::UnlinkedCodeBlock::shrinkToFit):
* bytecode/UnlinkedCodeBlock.h: Added locking.
(JSC::UnlinkedCodeBlock::addRegExp):
(JSC::UnlinkedCodeBlock::addConstant):
(JSC::UnlinkedCodeBlock::addFunctionDecl):
(JSC::UnlinkedCodeBlock::addFunctionExpr):
(JSC::UnlinkedCodeBlock::createRareDataIfNecessary):
(JSC::UnlinkedCodeBlock::shrinkToFit): Deleted.
* debugger/Debugger.cpp: Use the right delete API.
(JSC::Debugger::recompileAllJSFunctions):
* dfg/DFGAbstractInterpreterInlines.h:
(JSC::DFG::AbstractInterpreter<AbstractStateType>::executeEffects): Fix a pre-existing bug in ToFunction constant folding.
* dfg/DFGClobberize.h: Add support for nuking.
(JSC::DFG::clobberize):
* dfg/DFGClobbersExitState.cpp: Add support for nuking.
(JSC::DFG::clobbersExitState):
* dfg/DFGFixupPhase.cpp: Add support for nuking.
(JSC::DFG::FixupPhase::fixupNode):
(JSC::DFG::FixupPhase::indexForChecks):
(JSC::DFG::FixupPhase::originForCheck):
(JSC::DFG::FixupPhase::speculateForBarrier):
(JSC::DFG::FixupPhase::insertCheck):
(JSC::DFG::FixupPhase::fixupChecksInBlock):
* dfg/DFGSpeculativeJIT.cpp: Add support for nuking.
(JSC::DFG::SpeculativeJIT::compileAllocatePropertyStorage):
(JSC::DFG::SpeculativeJIT::compileReallocatePropertyStorage):
* ftl/FTLLowerDFGToB3.cpp: Add support for nuking.
(JSC::FTL::DFG::LowerDFGToB3::allocatePropertyStorage):
(JSC::FTL::DFG::LowerDFGToB3::reallocatePropertyStorage):
(JSC::FTL::DFG::LowerDFGToB3::mutatorFence):
(JSC::FTL::DFG::LowerDFGToB3::nukeStructureAndSetButterfly):
(JSC::FTL::DFG::LowerDFGToB3::setButterfly): Deleted.
* heap/CodeBlockSet.cpp: We need to be more careful about the CodeBlockSet workflow during GC, since we will allocate CodeBlocks in eden while collecting.
(JSC::CodeBlockSet::clearMarksForFullCollection):
(JSC::CodeBlockSet::deleteUnmarkedAndUnreferenced):
* heap/Heap.cpp: Added code to measure max pauses. Added a better collectContinuously mode.
(JSC::Heap::lastChanceToFinalize): Stop the collectContinuously thread.
(JSC::Heap::harvestWeakReferences): Inline SlotVisitor::harvestWeakReferences.
(JSC::Heap::finalizeUnconditionalFinalizers): Inline SlotVisitor::finalizeUnconditionalReferences.
(JSC::Heap::markToFixpoint): We need to do some MarkedSpace stuff before every conservative scan, rather than just at the start of marking, so we now call prepareForConservativeScan() before each conservative scan. Also call a less-parallel version of drainInParallel when the mutator is running.
(JSC::Heap::collectInThread): Inline Heap::prepareForAllocation().
(JSC::Heap::stopIfNecessarySlow): We need to be more careful about ensuring that we run finalization before and after stopping. Also, we should sanitize stack when stopping the world.
(JSC::Heap::acquireAccessSlow): Add some optional debug prints.
(JSC::Heap::handleNeedFinalize): Assert that we are running this when the world is not stopped.
(JSC::Heap::finalize): Remove the old collectContinuously code.
(JSC::Heap::requestCollection): We don't need to sanitize stack here anymore.
(JSC::Heap::notifyIsSafeToCollect): Start the collectContinuously thread. It will request collection 1 KHz.
(JSC::Heap::prepareForAllocation): Deleted.
(JSC::Heap::preventCollection): Prevent any new concurrent GCs from being initiated.
(JSC::Heap::allowCollection):
(JSC::Heap::forEachSlotVisitor): Allows us to safely iterate slot visitors.
* heap/Heap.h:
* heap/HeapInlines.h:
(JSC::Heap::writeBarrier): If the 'to' cell is not NewWhite then it could be AnthraciteOrBlack. During a full collection, objects may be AnthraciteOrBlack from a previous GC. Turns out, we don't benefit from this optimization so we can just kill it.
* heap/HeapSnapshotBuilder.cpp:
(JSC::HeapSnapshotBuilder::buildSnapshot): This needs to use PreventCollectionScope to ensure snapshot soundness.
* heap/ListableHandler.h:
(JSC::ListableHandler::isOnList): Useful helper.
* heap/LockDuringMarking.h:
(JSC::lockDuringMarking): It's a locker that only locks while we're marking.
* heap/MarkedAllocator.cpp:
(JSC::MarkedAllocator::addBlock): Hold the bitvector lock while resizing.
* heap/MarkedBlock.cpp: Hold the bitvector lock while accessing the bitvectors while the mutator is running.
* heap/MarkedSpace.cpp:
(JSC::MarkedSpace::prepareForConservativeScan): We used to do this in prepareForMarking, but we need to do it before each conservative scan not just before marking.
(JSC::MarkedSpace::prepareForMarking): Remove the logic moved to prepareForConservativeScan.
* heap/MarkedSpace.h:
* heap/PreventCollectionScope.h: Added.
* heap/SlotVisitor.cpp: Refactored drainFromShared so that we can write a similar function called drainInParallelPassively.
(JSC::SlotVisitor::updateMutatorIsStopped): Update whether we can use "fast" scanning.
(JSC::SlotVisitor::mutatorIsStoppedIsUpToDate):
(JSC::SlotVisitor::didReachTermination):
(JSC::SlotVisitor::hasWork):
(JSC::SlotVisitor::drain): This now uses the rightToRun lock to allow the main GC thread to safepoint the workers.
(JSC::SlotVisitor::drainFromShared):
(JSC::SlotVisitor::drainInParallelPassively): This runs marking with one fewer threads than normal. It's useful for when we have resumed the mutator, since then the mutator has a better chance of getting on a core.
(JSC::SlotVisitor::addWeakReferenceHarvester):
(JSC::SlotVisitor::addUnconditionalFinalizer):
(JSC::SlotVisitor::harvestWeakReferences): Deleted.
(JSC::SlotVisitor::finalizeUnconditionalFinalizers): Deleted.
* heap/SlotVisitor.h:
* heap/SlotVisitorInlines.h: Outline stuff.
(JSC::SlotVisitor::addWeakReferenceHarvester): Deleted.
(JSC::SlotVisitor::addUnconditionalFinalizer): Deleted.
* runtime/InferredType.cpp: This needed thread safety.
(JSC::InferredType::visitChildren): This needs to keep its structure finalizer alive until it runs.
(JSC::InferredType::set):
(JSC::InferredType::InferredStructureFinalizer::finalizeUnconditionally):
* runtime/InferredType.h:
* runtime/InferredValue.cpp: This needed thread safety.
(JSC::InferredValue::visitChildren):
(JSC::InferredValue::ValueCleanup::finalizeUnconditionally):
* runtime/JSArray.cpp:
(JSC::JSArray::unshiftCountSlowCase): Update to use new butterfly API.
(JSC::JSArray::unshiftCountWithArrayStorage): Update to use new butterfly API.
* runtime/JSArrayBufferView.cpp:
(JSC::JSArrayBufferView::visitChildren): Thread safety.
* runtime/JSCell.h:
(JSC::JSCell::setStructureIDDirectly): This is used for nuking the structure.
(JSC::JSCell::InternalLocker::InternalLocker): Deleted. The cell is now the lock.
(JSC::JSCell::InternalLocker::~InternalLocker): Deleted. The cell is now the lock.
* runtime/JSCellInlines.h:
(JSC::JSCell::structure): Clean this up.
(JSC::JSCell::lock): The cell is now the lock.
(JSC::JSCell::tryLock):
(JSC::JSCell::unlock):
(JSC::JSCell::isLocked):
(JSC::JSCell::lockInternalLock): Deleted.
(JSC::JSCell::unlockInternalLock): Deleted.
* runtime/JSFunction.cpp:
(JSC::JSFunction::visitChildren): Thread safety.
* runtime/JSGenericTypedArrayViewInlines.h:
(JSC::JSGenericTypedArrayView<Adaptor>::visitChildren): Thread safety.
(JSC::JSGenericTypedArrayView<Adaptor>::slowDownAndWasteMemory): Thread safety.
* runtime/JSObject.cpp:
(JSC::JSObject::markAuxiliaryAndVisitOutOfLineProperties): Factor out this "easy" step of butterfly visiting.
(JSC::JSObject::visitButterfly): Make this achieve 100% precision about structure-butterfly relationships. This relies on the mutator "nuking" the structure prior to "locked" structure-butterfly transitions.
(JSC::JSObject::visitChildren): Use the new, nicer API.
(JSC::JSFinalObject::visitChildren): Use the new, nicer API.
(JSC::JSObject::enterDictionaryIndexingModeWhenArrayStorageAlreadyExists): Use the new butterfly API.
(JSC::JSObject::createInitialUndecided): Use the new butterfly API.
(JSC::JSObject::createInitialInt32): Use the new butterfly API.
(JSC::JSObject::createInitialDouble): Use the new butterfly API.
(JSC::JSObject::createInitialContiguous): Use the new butterfly API.
(JSC::JSObject::createArrayStorage): Use the new butterfly API.
(JSC::JSObject::convertUndecidedToContiguous): Use the new butterfly API.
(JSC::JSObject::convertUndecidedToArrayStorage): Use the new butterfly API.
(JSC::JSObject::convertInt32ToArrayStorage): Use the new butterfly API.
(JSC::JSObject::convertDoubleToContiguous): Use the new butterfly API.
(JSC::JSObject::convertDoubleToArrayStorage): Use the new butterfly API.
(JSC::JSObject::convertContiguousToArrayStorage): Use the new butterfly API.
(JSC::JSObject::increaseVectorLength): Use the new butterfly API.
(JSC::JSObject::shiftButterflyAfterFlattening): Use the new butterfly API.
* runtime/JSObject.h:
(JSC::JSObject::setButterfly): This now does all of the fences. Only use this when you are not also transitioning the structure or the structure's lastOffset.
(JSC::JSObject::nukeStructureAndSetButterfly): Use this when doing locked structure-butterfly transitions.
* runtime/JSObjectInlines.h:
(JSC::JSObject::putDirectWithoutTransition): Use the newly factored out API.
(JSC::JSObject::prepareToPutDirectWithoutTransition): Factor this out!
(JSC::JSObject::putDirectInternal): Use the newly factored out API.
* runtime/JSPropertyNameEnumerator.cpp:
(JSC::JSPropertyNameEnumerator::finishCreation): Locks!
(JSC::JSPropertyNameEnumerator::visitChildren): Locks!
* runtime/JSSegmentedVariableObject.cpp:
(JSC::JSSegmentedVariableObject::visitChildren): Locks!
* runtime/JSString.cpp:
(JSC::JSString::visitChildren): Thread safety.
* runtime/ModuleProgramExecutable.cpp:
(JSC::ModuleProgramExecutable::visitChildren): Thread safety.
* runtime/Options.cpp: For now we disable concurrent GC on not-X86_64.
(JSC::recomputeDependentOptions):
* runtime/Options.h: Change the default max GC parallelism to 8. I don't know why it was still 7.
* runtime/SamplingProfiler.cpp:
(JSC::SamplingProfiler::stackTracesAsJSON): This needs to defer GC before grabbing its lock.
* runtime/SparseArrayValueMap.cpp: This needed thread safety.
(JSC::SparseArrayValueMap::add):
(JSC::SparseArrayValueMap::remove):
(JSC::SparseArrayValueMap::visitChildren):
* runtime/SparseArrayValueMap.h:
* runtime/Structure.cpp: This had a race between addNewPropertyTransition and visitChildren.
(JSC::Structure::Structure):
(JSC::Structure::materializePropertyTable):
(JSC::Structure::addNewPropertyTransition):
(JSC::Structure::flattenDictionaryStructure):
(JSC::Structure::add): Help out with nuking support - the m_offset needs to play along.
(JSC::Structure::visitChildren):
* runtime/Structure.h: Make some useful things public - like the notion of a lastOffset.
* runtime/StructureChain.cpp:
(JSC::StructureChain::visitChildren): Thread safety!
* runtime/StructureChain.h: Thread safety!
* runtime/StructureIDTable.cpp:
(JSC::StructureIDTable::allocateID): Ensure that we don't get nuked IDs.
* runtime/StructureIDTable.h: Add the notion of a nuked ID! It's a bit that the runtime never sees except during specific shady actions like locked structure-butterfly transitions. "Nuking" tells the GC to steer clear and rescan once we fire the barrier.
(JSC::nukedStructureIDBit):
(JSC::nuke):
(JSC::isNuked):
(JSC::decontaminate):
* runtime/StructureInlines.h:
(JSC::Structure::hasIndexingHeader): Better API.
(JSC::Structure::add):
* runtime/VM.cpp: Better GC interaction.
(JSC::VM::ensureWatchdog):
(JSC::VM::deleteAllLinkedCode):
(JSC::VM::deleteAllCode):
* runtime/VM.h:
(JSC::VM::getStructure): Why wasn't this always an API!
* runtime/WebAssemblyExecutable.cpp:
(JSC::WebAssemblyExecutable::visitChildren): Thread safety.
Source/WebCore:
Concurrent GC should be stable enough to land enabled on X86_64
https://bugs.webkit.org/show_bug.cgi?id=164990
Reviewed by Geoffrey Garen.
Made WebCore down with concurrent marking by adding some locking and adapting to some new API.
This has new test modes in run-sjc-stress-tests. Also, the way that LayoutTests run is already
a fantastic GC test.
* ForwardingHeaders/heap/DeleteAllCodeEffort.h: Added.
* ForwardingHeaders/heap/LockDuringMarking.h: Added.
* bindings/js/GCController.cpp:
(WebCore::GCController::deleteAllCode):
(WebCore::GCController::deleteAllLinkedCode):
* bindings/js/GCController.h:
* bindings/js/JSDOMBinding.cpp:
(WebCore::getCachedDOMStructure):
(WebCore::cacheDOMStructure):
* bindings/js/JSDOMGlobalObject.cpp:
(WebCore::JSDOMGlobalObject::addBuiltinGlobals):
(WebCore::JSDOMGlobalObject::visitChildren):
* bindings/js/JSDOMGlobalObject.h:
(WebCore::getDOMConstructor):
* bindings/js/JSDOMPromise.cpp:
(WebCore::DeferredPromise::DeferredPromise):
(WebCore::DeferredPromise::clear):
* bindings/js/JSXPathResultCustom.cpp:
(WebCore::JSXPathResult::visitAdditionalChildren):
* dom/EventListenerMap.cpp:
(WebCore::EventListenerMap::clear):
(WebCore::EventListenerMap::replace):
(WebCore::EventListenerMap::add):
(WebCore::EventListenerMap::remove):
(WebCore::EventListenerMap::find):
(WebCore::EventListenerMap::removeFirstEventListenerCreatedFromMarkup):
(WebCore::EventListenerMap::copyEventListenersNotCreatedFromMarkupToTarget):
(WebCore::EventListenerIterator::EventListenerIterator):
* dom/EventListenerMap.h:
(WebCore::EventListenerMap::lock):
* dom/EventTarget.cpp:
(WebCore::EventTarget::visitJSEventListeners):
* dom/EventTarget.h:
(WebCore::EventTarget::visitJSEventListeners): Deleted.
* dom/Node.cpp:
(WebCore::Node::eventTargetDataConcurrently):
(WebCore::Node::ensureEventTargetData):
(WebCore::Node::clearEventTargetData):
* dom/Node.h:
* page/MemoryRelease.cpp:
(WebCore::releaseCriticalMemory):
* page/cocoa/MemoryReleaseCocoa.mm:
(WebCore::jettisonExpensiveObjectsOnTopLevelNavigation):
(WebCore::registerMemoryReleaseNotifyCallbacks):
Source/WTF:
Concurrent GC should be stable enough to land enabled on X86_64
https://bugs.webkit.org/show_bug.cgi?id=164990
Reviewed by Geoffrey Garen.
Adds the ability to say:
auto locker = holdLock(any type of lock)
Instead of having to say:
Locker<LockType> locker(locks of type LockType)
I think that we should use "auto locker = holdLock(lock)" as the default way that we acquire
locks unless we need to use a special locker type.
This also adds the ability to safepoint a lock. Safepointing a lock is basically a super fast
way of unlocking it fairly and then immediately relocking it - i.e. letting anyone who is
waiting to run without losing steam of there is noone waiting.
* wtf/Lock.cpp:
(WTF::LockBase::safepointSlow):
* wtf/Lock.h:
(WTF::LockBase::safepoint):
* wtf/LockAlgorithm.h:
(WTF::LockAlgorithm::safepointFast):
(WTF::LockAlgorithm::safepoint):
(WTF::LockAlgorithm::safepointSlow):
* wtf/Locker.h:
(WTF::AbstractLocker::AbstractLocker):
(WTF::Locker::tryLock):
(WTF::Locker::operator bool):
(WTF::Locker::Locker):
(WTF::Locker::operator=):
(WTF::holdLock):
(WTF::tryHoldLock):
Tools:
Concurrent GC should be stable enough to land enabled
https://bugs.webkit.org/show_bug.cgi?id=164990
Reviewed by Geoffrey Garen.
Add a new mode that runs GC continuously. Also made eager modes run GC continuously.
It's clear that this works just fine in release, but I'm still trying to figure out if it's
safe for debug. It might be too slow for debug.
* Scripts/run-jsc-stress-tests:
Canonical link: https://commits.webkit.org/183229@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@209570 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-12-08 22:14:50 +00:00
|
|
|
|
|
|
|
void safepoint()
|
|
|
|
{
|
|
|
|
if (UNLIKELY(!DefaultLockAlgorithm::safepointFast(m_byte)))
|
|
|
|
safepointSlow();
|
|
|
|
}
|
WTF::Lock should be fair eventually
https://bugs.webkit.org/show_bug.cgi?id=159384
Reviewed by Geoffrey Garen.
Source/WTF:
In https://webkit.org/blog/6161/locking-in-webkit/ we showed how relaxing the fairness of
locks makes them fast. That post presented lock fairness as a trade-off between two
extremes:
- Barging. A barging lock, like WTF::Lock, releases the lock in unlock() even if there was a
thread on the queue. If there was a thread on the queue, the lock is released and that
thread is made runnable. That thread may then grab the lock, or some other thread may grab
the lock first (it may barge). Usually, the barging thread is the thread that released the
lock in the first place. This maximizes throughput but hurts fairness. There is no good
theoretical bound on how unfair the lock may become, but empirical data suggests that it's
fair enough for the cases we previously measured.
- FIFO. A FIFO lock, like HandoffLock in ToyLocks.h, does not release the lock in unlock()
if there is a thread waiting. If there is a thread waiting, unlock() will make that thread
runnable and inform it that it now holds the lock. This ensures perfect round-robin
fairness and allows us to reason theoretically about how long it may take for a thread to
grab the lock. For example, if we know that only N threads are running and each one may
contend on a critical section, and each one may hold the lock for at most S seconds, then
the time it takes to grab the lock is N * S. Unfortunately, FIFO locks perform very badly
in most cases. This is because for the common case of short critical sections, they force
a context switch after each critical section if the lock is contended.
This change makes WTF::Lock almost as fair as FIFO while still being as fast as barging.
Thanks to this new algorithm, you can now have both of these things at the same time.
This change makes WTF::Lock eventually fair. We can almost (more on the caveats below)
guarantee that the time it takes to grab a lock is N * max(1ms, S). In other words, critical
sections that are longer than 1ms are always fair. For shorter critical sections, the amount
of time that any thread waits is 1ms times the number of threads. There are some caveats
that arise from our use of randomness, but even then, in the limit as the critical section
length goes to infinity, the lock becomes fair. The corner cases are unlikely to happen; our
experiments show that the lock becomes exactly as fair as a FIFO lock for any critical
section that is 1ms or longer.
The fairness mechanism is broken into two parts. WTF::Lock can now choose to unlock a lock
fairly or unfairly thanks to the new ParkingLot token mechanism. WTF::Lock knows when to use
fair unlocking based on a timeout mechanism in ParkingLot called timeToBeFair.
ParkingLot::unparkOne() and ParkingLot::parkConditionally() can now communicate with each
other via a token. unparkOne() can pass a token, which parkConditionally() will return. This
change also makes parkConditionally() a lot more precise about when it was unparked due to a
call to unparkOne(). If unparkOne() is told that a thread was unparked then this thread is
guaranteed to report that it was unparked rather than timing out, and that thread is
guaranteed to get the token that unparkOne() passed. The token is an intptr_t. We use it as
a boolean variable in WTF::Lock, but you could use it to pass arbitrary data structures. By
default, the token is zero. WTF::Lock's unlock() will pass 1 as the token if it is doing
fair unlocking. In that case, unlock() will not release the lock, and lock() will know that
it holds the lock as soon as parkConditionally() returns. Note that this algorithm relies
on unparkOne() invoking WTF::Lock's callback while the queue lock is held, so that WTF::Lock
can make a decision about unlock strategy and inject a token while it has complete knowledge
over the state of the queue. As such, it's not immediately obvious how to implement this
algorithm on top of futexes. You really need ParkingLot!
WTF::Lock does not use fair unlocking every time. We expose a new API, Lock::unlockFairly(),
which forces the fair unlocking behavior. Additionally, ParkingLot now maintains a
per-bucket stochastic fairness timeout. When the timeout fires, the unparkOne() callback
sees UnparkResult::timeToBeFair = true. This timeout is set to be anywhere from 0ms to 1ms
at random. When a dequeue happens and there are threads that actually get dequeued, we check
if the time since the last unfair unlock (the last time timeToBeFair was set to true) is
more than the timeout amount. If so, then we set timeToBeFair to true and reset the timeout.
This means that in the absence of ParkingLot collisions, unfair unlocking is guaranteed to
happen at least once per millisecond. It will happen at 2 KHz on average. If there are
collisions, then each collision adds one millisecond to the worst case (and 0.5 ms to the
average case). The reason why we don't just use a fixed 1ms timeout is that we want to avoid
resonance. Imagine a program in which some thread acquires a lock at 1 KHz in-phase with the
timeToBeFair timeout. Then this thread would be the benefactor of fairness to the detriment
of everyone else. Randomness ensures that we aren't too fair to any one thread.
Empirically, this is neutral on our major benchmarks like JetStream but it's an enormous
improvement in LockFairnessTest. It's common for an unfair lock (either our BargingLock, the
old WTF::Lock, any of the other futex-based locks that barge, or new os_unfair_lock) to
allow only one thread to hold the lock during a whole second in which each thread is holding
the lock for 1ms at a time. This is because in a barging lock, releasing a lock after
holding it for 1ms and then reacquiring it immediately virtually ensures that none of the
other threads can wake up in time to grab it before it's relocked. But the new WTF::Lock
handles this case like a champ: each thread gets equal turns.
Here's some data. If we launch 10 threads and have each of them run for 1 second while
repeatedly holding a critical section for 1ms, then here's how many times each thread gets
to hold the lock using the old WTF::Lock algorithm:
799, 6, 1, 1, 1, 1, 1, 1, 1, 1
One thread hogged the lock for almost the whole time! With the new WTF::Lock, the lock
becomes totally fair:
80, 79, 79, 79, 79, 79, 79, 80, 80, 79
I don't know of anyone creating such an automatically-fair adaptive lock before, so I think
that this is a pretty awesome advancement to the state of the art!
This change is good for three reasons:
- We do have long critical sections in WebKit and we don't want to have to worry about
starvation. This reduces the likelihood that we will see starvation due to our lock
strategy.
- I was talking to ggaren about bmalloc's locking needs, and he wanted unlockFairly() or
lockFairly() or some moral equivalent for the scavenger thread.
- If we use a WTF::Lock to manage heap access in a multithreaded GC, we'll need the ability
to unlock and relock without barging.
* benchmarks/LockFairnessTest.cpp:
(main):
* benchmarks/ToyLocks.h:
* wtf/Condition.h:
(WTF::ConditionBase::waitUntil):
(WTF::ConditionBase::notifyOne):
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
(WTF::LockBase::unlockFairlySlow):
(WTF::LockBase::unlockSlowImpl):
* wtf/Lock.h:
(WTF::LockBase::try_lock):
(WTF::LockBase::unlock):
(WTF::LockBase::unlockFairly):
(WTF::LockBase::isHeld):
(WTF::LockBase::isFullyReset):
* wtf/ParkingLot.cpp:
(WTF::ParkingLot::parkConditionallyImpl):
(WTF::ParkingLot::unparkOne):
(WTF::ParkingLot::unparkOneImpl):
(WTF::ParkingLot::unparkAll):
* wtf/ParkingLot.h:
(WTF::ParkingLot::parkConditionally):
(WTF::ParkingLot::compareAndPark):
(WTF::ParkingLot::unparkOne):
Tools:
* TestWebKitAPI/Tests/WTF/ParkingLot.cpp:
Canonical link: https://commits.webkit.org/178039@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@203350 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-07-18 18:32:52 +00:00
|
|
|
|
Lightweight locks should be adaptive
https://bugs.webkit.org/show_bug.cgi?id=147545
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
* dfg/DFGCommon.cpp:
(JSC::DFG::startCrashing):
* heap/CopiedBlock.h:
(JSC::CopiedBlock::workListLock):
* heap/CopiedBlockInlines.h:
(JSC::CopiedBlock::shouldReportLiveBytes):
(JSC::CopiedBlock::reportLiveBytes):
* heap/CopiedSpace.cpp:
(JSC::CopiedSpace::doneFillingBlock):
* heap/CopiedSpace.h:
(JSC::CopiedSpace::CopiedGeneration::CopiedGeneration):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::recycleEvacuatedBlock):
* heap/GCThreadSharedData.cpp:
(JSC::GCThreadSharedData::didStartCopying):
* heap/GCThreadSharedData.h:
(JSC::GCThreadSharedData::getNextBlocksToCopy):
* heap/ListableHandler.h:
(JSC::ListableHandler::List::addThreadSafe):
(JSC::ListableHandler::List::addNotThreadSafe):
* heap/MachineStackMarker.cpp:
(JSC::MachineThreads::tryCopyOtherThreadStacks):
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::copyLater):
* parser/SourceProvider.cpp:
(JSC::SourceProvider::~SourceProvider):
(JSC::SourceProvider::getID):
* profiler/ProfilerDatabase.cpp:
(JSC::Profiler::Database::addDatabaseToAtExit):
(JSC::Profiler::Database::removeDatabaseFromAtExit):
(JSC::Profiler::Database::removeFirstAtExitDatabase):
* runtime/TypeProfilerLog.h:
Source/WebCore:
* bindings/objc/WebScriptObject.mm:
(WebCore::getJSWrapper):
(WebCore::addJSWrapper):
(WebCore::removeJSWrapper):
(WebCore::removeJSWrapperIfRetainCountOne):
* platform/audio/mac/CARingBuffer.cpp:
(WebCore::CARingBuffer::setCurrentFrameBounds):
(WebCore::CARingBuffer::getCurrentFrameBounds):
* platform/audio/mac/CARingBuffer.h:
* platform/ios/wak/WAKWindow.mm:
(-[WAKWindow setExposedScrollViewRect:]):
(-[WAKWindow exposedScrollViewRect]):
Source/WebKit2:
* WebProcess/WebPage/EventDispatcher.cpp:
(WebKit::EventDispatcher::clearQueuedTouchEventsForPage):
(WebKit::EventDispatcher::getQueuedTouchEventsForPage):
(WebKit::EventDispatcher::touchEvent):
(WebKit::EventDispatcher::dispatchTouchEvents):
* WebProcess/WebPage/EventDispatcher.h:
* WebProcess/WebPage/ViewUpdateDispatcher.cpp:
(WebKit::ViewUpdateDispatcher::visibleContentRectUpdate):
(WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate):
* WebProcess/WebPage/ViewUpdateDispatcher.h:
Source/WTF:
A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition
overhead is lower than system locks and because they take dramatically less space than system
locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock
acquire is up to 10x faster and under microcontention - short critical section with two or
more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4
bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit
should continue to avoid system locks - they are just far too slow and far too big.
But there is a problem with this idiom. System lock implementations will sleep a thread when
it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU.
In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for
microcontention, but awful when the lock will not be released for a while. In fact, when
critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is
almost 100x more than the CPU time cost of a system lock. This case doesn't arise too
frequently in our current uses of spinlocks, but that's probably because right now there are
places where we make a conscious decision to use system locks - even though they use more
memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a
while to acquire the lock.
The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new
concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks
that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The
idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held,
the slow path tries some number of spins to acquire the lock, and if that fails, the thread is
put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and
the lock itself is a tagged pointer: either it is just bits telling us the complete lock state
(not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the
lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path
if the lock is not contended, a short burst of spinning for microcontention, and a full-blown
queue for critical sections that are held for a long time.
On a locking microbenchmark, this new Lock exhibits the following performance
characteristics:
- Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster
than a system mutex.
- Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster
than a system mutex.
- CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a
SpinLock.
- Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than
a SpinLock.
This patch replaces all uses of SpinLock with Lock, since our critical sections are not
no-ops so if you do basically anything in your critical section, the Lock overhead will be
invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead
instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is
as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time.
This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits
of having a lock that just uses a byte are still better than the CPU wastage benefits of
Lock. But, this work will enable some future work to create locks that will fit in just 1.6
bits: https://bugs.webkit.org/show_bug.cgi?id=147665.
Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak
CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if
there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the
possibility of a null queue head.
[1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf
* WTF.vcxproj/WTF.vcxproj:
* WTF.xcodeproj/project.pbxproj:
* benchmarks: Added.
* benchmarks/LockSpeedTest.cpp: Added.
(main):
* wtf/Atomics.h:
(WTF::Atomic::compareExchangeWeak):
(WTF::Atomic::compareExchangeStrong):
* wtf/CMakeLists.txt:
* wtf/Lock.cpp: Added.
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
* wtf/Lock.h: Added.
(WTF::LockBase::lock):
(WTF::LockBase::unlock):
(WTF::LockBase::isHeld):
(WTF::LockBase::isLocked):
(WTF::Lock::Lock):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocatorHandle::shrink):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
(WTF::MetaAllocator::addFreshFreeSpace):
(WTF::MetaAllocator::debugFreeSpaceSize):
* wtf/MetaAllocator.h:
* wtf/SpinLock.h:
* wtf/ThreadingPthreads.cpp:
* wtf/ThreadingWin.cpp:
* wtf/text/AtomicString.cpp:
* wtf/text/AtomicStringImpl.cpp:
(WTF::AtomicStringTableLocker::AtomicStringTableLocker):
Tools:
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/Lock.cpp: Added.
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/165908@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
|
|
|
bool isHeld() const
|
|
|
|
{
|
The GC should be optionally concurrent and disabled by default
https://bugs.webkit.org/show_bug.cgi?id=164454
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
This started out as a patch to have the GC scan the stack at the end, and then the
outage happened and I decided to pick a more aggresive target: give the GC a concurrent
mode that can be enabled at runtime, and whose only effect is that it turns on the
ResumeTheWorldScope. This gives our GC a really intuitive workflow: by default, the GC
thread is running solo with the world stopped and the parallel markers converged and
waiting. We have a parallel work scope to enable the parallel markers and now we have a
ResumeTheWorldScope that will optionally resume the world and then stop it again.
It's easy to make a concurrent GC that always instantly crashes. I can't promise that
this one won't do that when you run it. I set a specific goal: I wanted to do >10
concurrent GCs in debug mode with generations, optimizing JITs, and parallel marking
disabled.
To reach this milestone, I needed to do a bunch of stuff:
- The mutator needs a separate mark stack for the barrier, since it will mutate this
stack concurrently to the collector's slot visitors.
- The use of CellState to indicate whether an object is being scanned the first time or
a subsequent time was racy. It fails spectacularly when a barrier is fired at the same
time as visitChildren is running or if the barrier runs at the same time as the GC
marks the same object. So, I split SlotVisitor's mark stacks. It's now the case that
you know why you're being scanned by looking at which stack you came off of.
- All of root marking must be in the collector fixpoint. I renamed markRoots to
markToFixpoint. They say concurrency is hard, but the collector looks more intuitive
this way. We never gained anything from forcing people to make a choice between
scanning something in the fixpoint versus outside of it. Because root scanning is
cheap, we can afford to do it repeatedly, which means all root scanning can now do
constraint-based marking (like: I'll mark you if that thing is marked).
- JSObject::visitChildren's scanning of the butterfly raced with property additions,
indexed storage transitions and resizing, and a bunch of miscellaneous dirty butterfly
reshaping functions - like the one that flattens a dictionary and some sneaky
ArrayStorage transformations. Many of these can be fixed by using store-store fences
in the mutator and load-load fences in the collector. I've adopted the rule that the
collector must always see either a butterfly and structure that match or a newer
butterfly with an older structure, where their age is just one transition apart. This
can be achieved with fences. For the cases where it breaks down, I added a lock to
every JSCell. This is a full-fledged WTF lock that we sneak into two available bits in
the indexingType. See the WTF ChangeLog for details.
The mutator fencing rules are as follows:
- Store-store fence before and after setting the butterfly.
- Store-store fence before setting structure if you had changed the shape of the
butterfly.
- Store-store fence after initializing all fields in an allocation.
- A dictionary Structure can change in strange ways while the GC is trying to scan it.
So, JSObject::visitChildren will now grab the object's structure's lock if the
object's structure is a dictionary. Dictionary structures are 1:1 with their object,
so this does not reduce GC parallelism (super unlikely that the GC will simultaneously
scan an object from two threads).
- The GC can blow away a Structure's property table at any time. As a small consolation,
it's now holding the Structure's lock when it does so. But there was tons of code in
Structure that uses DeferGC to prevent the GC from blowing away the property table.
This doesn't work with concurrent GC, since DeferGC only means that the GC won't run
its safepoint (i.e. stop-the-world code) in the DeferGC region. It will still do
marking and it was the Structure::visitChildren that would delete the table. It turns
out that Structure's reliance on the property table not being deleted was the product
of code rot. We already had functions that would materialize the table on demand. We
were simply making the mistake of saying:
structure->materializePropertyMap();
...
structure->propertyTable()->things
Instead of saying:
PropertyTable* table = structure->ensurePropertyTable();
...
table->things
Switching the code to use the latter idiom allowed me to simplify the code a lot while
fixing the race.
- The LLInt's get_by_val handling was broken because the indexing shape constants were
wrong. Once I started putting more things into the IndexingType, that started causing
crashes for me. So I fixed LLInt. That turned out to be a lot of work, since that code
had rotted in subtle ways.
This is a speed-up in SunSpider, probably because of the LLInt fix. This is neutral on
Octane and Kraken. It's a smaller slow-down on LongSpider, but I think we can ignore
that (we don't view LongSpider as an official benchmark). By default, the concurrent GC
is disabled: in all of the places where it would have resumed the world to run marking
concurrently to the mutator, it will just skip the resume step. When you enable
concurrent GC (--useConcurrentGC=true), it can sometimes run Octane/splay to completion.
It seems to perform quite well: on my machine, it improves both splay-throughput and
splay-latency. It's probably unstable for other programs.
* API/JSVirtualMachine.mm:
(-[JSVirtualMachine isOldExternalObject:]):
* assembler/MacroAssemblerARMv7.h:
(JSC::MacroAssemblerARMv7::storeFence):
* bytecode/InlineAccess.cpp:
(JSC::InlineAccess::dumpCacheSizesAndCrash):
(JSC::InlineAccess::generateSelfPropertyAccess):
(JSC::InlineAccess::generateArrayLength):
* bytecode/ObjectAllocationProfile.h:
(JSC::ObjectAllocationProfile::offsetOfInlineCapacity):
(JSC::ObjectAllocationProfile::ObjectAllocationProfile):
(JSC::ObjectAllocationProfile::initialize):
(JSC::ObjectAllocationProfile::inlineCapacity):
(JSC::ObjectAllocationProfile::clear):
* bytecode/PolymorphicAccess.cpp:
(JSC::AccessCase::generateWithGuard):
(JSC::AccessCase::generateImpl):
* dfg/DFGArrayifySlowPathGenerator.h:
* dfg/DFGClobberize.h:
(JSC::DFG::clobberize):
* dfg/DFGOSRExitCompiler32_64.cpp:
(JSC::DFG::OSRExitCompiler::compileExit):
* dfg/DFGOSRExitCompiler64.cpp:
(JSC::DFG::OSRExitCompiler::compileExit):
* dfg/DFGOperations.cpp:
* dfg/DFGPlan.cpp:
(JSC::DFG::Plan::markCodeBlocks):
(JSC::DFG::Plan::rememberCodeBlocks):
* dfg/DFGPlan.h:
* dfg/DFGSpeculativeJIT.cpp:
(JSC::DFG::SpeculativeJIT::emitAllocateRawObject):
(JSC::DFG::SpeculativeJIT::checkArray):
(JSC::DFG::SpeculativeJIT::arrayify):
(JSC::DFG::SpeculativeJIT::compileMakeRope):
(JSC::DFG::SpeculativeJIT::compileNewFunctionCommon):
(JSC::DFG::SpeculativeJIT::compileCreateActivation):
(JSC::DFG::SpeculativeJIT::compileCreateDirectArguments):
(JSC::DFG::SpeculativeJIT::compileSpread):
(JSC::DFG::SpeculativeJIT::compileAllocatePropertyStorage):
(JSC::DFG::SpeculativeJIT::compileReallocatePropertyStorage):
(JSC::DFG::SpeculativeJIT::compileNewStringObject):
(JSC::DFG::SpeculativeJIT::compileNewTypedArray):
(JSC::DFG::SpeculativeJIT::compileStoreBarrier):
* dfg/DFGSpeculativeJIT64.cpp:
(JSC::DFG::SpeculativeJIT::compile):
(JSC::DFG::SpeculativeJIT::compileAllocateNewArrayWithSize):
* dfg/DFGTierUpCheckInjectionPhase.cpp:
(JSC::DFG::TierUpCheckInjectionPhase::run):
* dfg/DFGWorklist.cpp:
(JSC::DFG::Worklist::markCodeBlocks):
(JSC::DFG::Worklist::rememberCodeBlocks):
(JSC::DFG::markCodeBlocks):
(JSC::DFG::completeAllPlansForVM):
(JSC::DFG::rememberCodeBlocks):
* dfg/DFGWorklist.h:
* ftl/FTLAbstractHeapRepository.cpp:
(JSC::FTL::AbstractHeapRepository::AbstractHeapRepository):
(JSC::FTL::AbstractHeapRepository::computeRangesAndDecorateInstructions):
* ftl/FTLAbstractHeapRepository.h:
* ftl/FTLJITCode.cpp:
(JSC::FTL::JITCode::~JITCode):
* ftl/FTLLowerDFGToB3.cpp:
(JSC::FTL::DFG::LowerDFGToB3::compilePutStructure):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateActivation):
(JSC::FTL::DFG::LowerDFGToB3::compileNewFunction):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateDirectArguments):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateRest):
(JSC::FTL::DFG::LowerDFGToB3::compileNewObject):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArray):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSpread):
(JSC::FTL::DFG::LowerDFGToB3::compileSpread):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayBuffer):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSize):
(JSC::FTL::DFG::LowerDFGToB3::compileNewTypedArray):
(JSC::FTL::DFG::LowerDFGToB3::compileMakeRope):
(JSC::FTL::DFG::LowerDFGToB3::compileMultiPutByOffset):
(JSC::FTL::DFG::LowerDFGToB3::compileMaterializeNewObject):
(JSC::FTL::DFG::LowerDFGToB3::compileMaterializeCreateActivation):
(JSC::FTL::DFG::LowerDFGToB3::splatWords):
(JSC::FTL::DFG::LowerDFGToB3::allocatePropertyStorage):
(JSC::FTL::DFG::LowerDFGToB3::reallocatePropertyStorage):
(JSC::FTL::DFG::LowerDFGToB3::allocateObject):
(JSC::FTL::DFG::LowerDFGToB3::isArrayType):
(JSC::FTL::DFG::LowerDFGToB3::emitStoreBarrier):
(JSC::FTL::DFG::LowerDFGToB3::mutatorFence):
(JSC::FTL::DFG::LowerDFGToB3::setButterfly):
* ftl/FTLOSRExitCompiler.cpp:
(JSC::FTL::compileStub):
* ftl/FTLOutput.cpp:
(JSC::FTL::Output::signExt32ToPtr):
(JSC::FTL::Output::fence):
* ftl/FTLOutput.h:
* heap/CellState.h:
* heap/GCSegmentedArray.h:
* heap/Heap.cpp:
(JSC::Heap::ResumeTheWorldScope::ResumeTheWorldScope):
(JSC::Heap::ResumeTheWorldScope::~ResumeTheWorldScope):
(JSC::Heap::Heap):
(JSC::Heap::~Heap):
(JSC::Heap::harvestWeakReferences):
(JSC::Heap::finalizeUnconditionalFinalizers):
(JSC::Heap::completeAllJITPlans):
(JSC::Heap::markToFixpoint):
(JSC::Heap::gatherStackRoots):
(JSC::Heap::beginMarking):
(JSC::Heap::visitConservativeRoots):
(JSC::Heap::visitCompilerWorklistWeakReferences):
(JSC::Heap::updateObjectCounts):
(JSC::Heap::endMarking):
(JSC::Heap::addToRememberedSet):
(JSC::Heap::collectInThread):
(JSC::Heap::stopTheWorld):
(JSC::Heap::resumeTheWorld):
(JSC::Heap::setGCDidJIT):
(JSC::Heap::setNeedFinalize):
(JSC::Heap::setMutatorWaiting):
(JSC::Heap::clearMutatorWaiting):
(JSC::Heap::finalize):
(JSC::Heap::flushWriteBarrierBuffer):
(JSC::Heap::writeBarrierSlowPath):
(JSC::Heap::canCollect):
(JSC::Heap::reportExtraMemoryVisited):
(JSC::Heap::reportExternalMemoryVisited):
(JSC::Heap::notifyIsSafeToCollect):
(JSC::Heap::markRoots): Deleted.
(JSC::Heap::visitExternalRememberedSet): Deleted.
(JSC::Heap::visitSmallStrings): Deleted.
(JSC::Heap::visitProtectedObjects): Deleted.
(JSC::Heap::visitArgumentBuffers): Deleted.
(JSC::Heap::visitException): Deleted.
(JSC::Heap::visitStrongHandles): Deleted.
(JSC::Heap::visitHandleStack): Deleted.
(JSC::Heap::visitSamplingProfiler): Deleted.
(JSC::Heap::visitTypeProfiler): Deleted.
(JSC::Heap::visitShadowChicken): Deleted.
(JSC::Heap::traceCodeBlocksAndJITStubRoutines): Deleted.
(JSC::Heap::visitWeakHandles): Deleted.
(JSC::Heap::flushOldStructureIDTables): Deleted.
(JSC::Heap::stopAllocation): Deleted.
* heap/Heap.h:
(JSC::Heap::collectorSlotVisitor):
(JSC::Heap::mutatorMarkStack):
(JSC::Heap::mutatorShouldBeFenced):
(JSC::Heap::addressOfMutatorShouldBeFenced):
(JSC::Heap::slotVisitor): Deleted.
(JSC::Heap::notifyIsSafeToCollect): Deleted.
(JSC::Heap::barrierShouldBeFenced): Deleted.
(JSC::Heap::addressOfBarrierShouldBeFenced): Deleted.
* heap/MarkStack.cpp:
(JSC::MarkStackArray::transferTo):
* heap/MarkStack.h:
* heap/MarkedAllocator.cpp:
(JSC::MarkedAllocator::tryAllocateIn):
* heap/MarkedBlock.cpp:
(JSC::MarkedBlock::MarkedBlock):
(JSC::MarkedBlock::Handle::specializedSweep):
(JSC::MarkedBlock::Handle::sweep):
(JSC::MarkedBlock::Handle::sweepHelperSelectMarksMode):
(JSC::MarkedBlock::Handle::stopAllocating):
(JSC::MarkedBlock::Handle::resumeAllocating):
(JSC::MarkedBlock::aboutToMarkSlow):
(JSC::MarkedBlock::Handle::didConsumeFreeList):
(JSC::SetNewlyAllocatedFunctor::SetNewlyAllocatedFunctor): Deleted.
(JSC::SetNewlyAllocatedFunctor::operator()): Deleted.
* heap/MarkedBlock.h:
* heap/MarkedSpace.cpp:
(JSC::MarkedSpace::resumeAllocating):
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::SlotVisitor):
(JSC::SlotVisitor::~SlotVisitor):
(JSC::SlotVisitor::reset):
(JSC::SlotVisitor::clearMarkStacks):
(JSC::SlotVisitor::appendJSCellOrAuxiliary):
(JSC::SlotVisitor::setMarkedAndAppendToMarkStack):
(JSC::SlotVisitor::appendToMarkStack):
(JSC::SlotVisitor::appendToMutatorMarkStack):
(JSC::SlotVisitor::visitChildren):
(JSC::SlotVisitor::donateKnownParallel):
(JSC::SlotVisitor::drain):
(JSC::SlotVisitor::drainFromShared):
(JSC::SlotVisitor::containsOpaqueRoot):
(JSC::SlotVisitor::donateAndDrain):
(JSC::SlotVisitor::mergeOpaqueRoots):
(JSC::SlotVisitor::dump):
(JSC::SlotVisitor::clearMarkStack): Deleted.
(JSC::SlotVisitor::opaqueRootCount): Deleted.
* heap/SlotVisitor.h:
(JSC::SlotVisitor::collectorMarkStack):
(JSC::SlotVisitor::mutatorMarkStack):
(JSC::SlotVisitor::isEmpty):
(JSC::SlotVisitor::bytesVisited):
(JSC::SlotVisitor::markStack): Deleted.
(JSC::SlotVisitor::bytesCopied): Deleted.
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::reportExtraMemoryVisited):
(JSC::SlotVisitor::reportExternalMemoryVisited):
* jit/AssemblyHelpers.cpp:
(JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo):
* jit/AssemblyHelpers.h:
(JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo):
(JSC::AssemblyHelpers::barrierStoreLoadFence):
(JSC::AssemblyHelpers::mutatorFence):
(JSC::AssemblyHelpers::storeButterfly):
(JSC::AssemblyHelpers::jumpIfMutatorFenceNotNeeded):
(JSC::AssemblyHelpers::emitInitializeInlineStorage):
(JSC::AssemblyHelpers::emitInitializeOutOfLineStorage):
(JSC::AssemblyHelpers::jumpIfBarrierStoreLoadFenceNotNeeded): Deleted.
* jit/JITInlines.h:
(JSC::JIT::emitArrayProfilingSiteWithCell):
* jit/JITOperations.cpp:
* jit/JITPropertyAccess.cpp:
(JSC::JIT::emit_op_put_to_scope):
(JSC::JIT::emit_op_put_to_arguments):
* llint/LLIntData.cpp:
(JSC::LLInt::Data::performAssertions):
* llint/LowLevelInterpreter.asm:
* llint/LowLevelInterpreter64.asm:
* runtime/ButterflyInlines.h:
(JSC::Butterfly::create):
(JSC::Butterfly::createOrGrowPropertyStorage):
* runtime/ConcurrentJITLock.h:
(JSC::GCSafeConcurrentJITLocker::NoDefer::NoDefer): Deleted.
* runtime/GenericArgumentsInlines.h:
(JSC::GenericArguments<Type>::getOwnPropertySlotByIndex):
(JSC::GenericArguments<Type>::putByIndex):
* runtime/IndexingType.h:
* runtime/JSArray.cpp:
(JSC::JSArray::unshiftCountSlowCase):
(JSC::JSArray::unshiftCountWithArrayStorage):
* runtime/JSCell.h:
(JSC::JSCell::InternalLocker::InternalLocker):
(JSC::JSCell::InternalLocker::~InternalLocker):
(JSC::JSCell::atomicCompareExchangeCellStateWeakRelaxed):
(JSC::JSCell::atomicCompareExchangeCellStateStrong):
(JSC::JSCell::indexingTypeAndMiscOffset):
(JSC::JSCell::indexingTypeOffset): Deleted.
* runtime/JSCellInlines.h:
(JSC::JSCell::JSCell):
(JSC::JSCell::finishCreation):
(JSC::JSCell::indexingTypeAndMisc):
(JSC::JSCell::indexingType):
(JSC::JSCell::setStructure):
(JSC::JSCell::callDestructor):
(JSC::JSCell::lockInternalLock):
(JSC::JSCell::unlockInternalLock):
* runtime/JSObject.cpp:
(JSC::JSObject::visitButterfly):
(JSC::JSObject::visitChildren):
(JSC::JSFinalObject::visitChildren):
(JSC::JSObject::enterDictionaryIndexingModeWhenArrayStorageAlreadyExists):
(JSC::JSObject::createInitialUndecided):
(JSC::JSObject::createInitialInt32):
(JSC::JSObject::createInitialDouble):
(JSC::JSObject::createInitialContiguous):
(JSC::JSObject::createArrayStorage):
(JSC::JSObject::convertUndecidedToArrayStorage):
(JSC::JSObject::convertInt32ToArrayStorage):
(JSC::JSObject::convertDoubleToArrayStorage):
(JSC::JSObject::convertContiguousToArrayStorage):
(JSC::JSObject::deleteProperty):
(JSC::JSObject::defineOwnIndexedProperty):
(JSC::JSObject::increaseVectorLength):
(JSC::JSObject::ensureLengthSlow):
(JSC::JSObject::reallocateAndShrinkButterfly):
(JSC::JSObject::allocateMoreOutOfLineStorage):
(JSC::JSObject::shiftButterflyAfterFlattening):
(JSC::JSObject::growOutOfLineStorage): Deleted.
* runtime/JSObject.h:
(JSC::JSFinalObject::JSFinalObject):
(JSC::JSObject::setButterfly):
(JSC::JSObject::getOwnNonIndexPropertySlot):
(JSC::JSObject::fillCustomGetterPropertySlot):
(JSC::JSObject::getOwnPropertySlot):
(JSC::JSObject::getPropertySlot):
(JSC::JSObject::setStructureAndButterfly): Deleted.
(JSC::JSObject::setButterflyWithoutChangingStructure): Deleted.
(JSC::JSObject::putDirectInternal): Deleted.
(JSC::JSObject::putDirectWithoutTransition): Deleted.
* runtime/JSObjectInlines.h:
(JSC::JSObject::getPropertySlot):
(JSC::JSObject::getNonIndexPropertySlot):
(JSC::JSObject::putDirectWithoutTransition):
(JSC::JSObject::putDirectInternal):
* runtime/Options.h:
* runtime/SparseArrayValueMap.h:
* runtime/Structure.cpp:
(JSC::Structure::dumpStatistics):
(JSC::Structure::findStructuresAndMapForMaterialization):
(JSC::Structure::materializePropertyTable):
(JSC::Structure::addNewPropertyTransition):
(JSC::Structure::changePrototypeTransition):
(JSC::Structure::attributeChangeTransition):
(JSC::Structure::toDictionaryTransition):
(JSC::Structure::takePropertyTableOrCloneIfPinned):
(JSC::Structure::nonPropertyTransition):
(JSC::Structure::isSealed):
(JSC::Structure::isFrozen):
(JSC::Structure::flattenDictionaryStructure):
(JSC::Structure::pin):
(JSC::Structure::pinForCaching):
(JSC::Structure::willStoreValueSlow):
(JSC::Structure::copyPropertyTableForPinning):
(JSC::Structure::add):
(JSC::Structure::remove):
(JSC::Structure::getPropertyNamesFromStructure):
(JSC::Structure::visitChildren):
(JSC::Structure::materializePropertyMap): Deleted.
(JSC::Structure::addPropertyWithoutTransition): Deleted.
(JSC::Structure::removePropertyWithoutTransition): Deleted.
(JSC::Structure::copyPropertyTable): Deleted.
(JSC::Structure::createPropertyMap): Deleted.
(JSC::PropertyTable::checkConsistency): Deleted.
(JSC::Structure::checkConsistency): Deleted.
* runtime/Structure.h:
* runtime/StructureIDBlob.h:
(JSC::StructureIDBlob::StructureIDBlob):
(JSC::StructureIDBlob::indexingTypeIncludingHistory):
(JSC::StructureIDBlob::setIndexingTypeIncludingHistory):
(JSC::StructureIDBlob::indexingTypeIncludingHistoryOffset):
(JSC::StructureIDBlob::indexingType): Deleted.
(JSC::StructureIDBlob::setIndexingType): Deleted.
(JSC::StructureIDBlob::indexingTypeOffset): Deleted.
* runtime/StructureInlines.h:
(JSC::Structure::get):
(JSC::Structure::checkOffsetConsistency):
(JSC::Structure::checkConsistency):
(JSC::Structure::add):
(JSC::Structure::remove):
(JSC::Structure::addPropertyWithoutTransition):
(JSC::Structure::removePropertyWithoutTransition):
(JSC::Structure::setPropertyTable):
(JSC::Structure::putWillGrowOutOfLineStorage): Deleted.
(JSC::Structure::propertyTable): Deleted.
(JSC::Structure::suggestedNewOutOfLineStorageCapacity): Deleted.
Source/WTF:
The reason why I went to such great pains to make WTF::Lock fit in two bits is that I
knew that I would eventually need to stuff one into some miscellaneous bits of the
JSCell header. That time has come, because the concurrent GC has numerous race
conditions in visitChildren that can be trivially fixed if each object just has an
internal lock. Some cell types might use it to simply protect their entire visitChildren
function and anything that mutates the fields it touches, while other cell types might
use it as a "lock of last resort" to handle corner cases of an otherwise wait-free or
lock-free algorithm. Right now, it's used to protect certain transformations involving
indexing storage.
To make this happen, I factored the WTF::Lock algorithm into a LockAlgorithm struct that
is templatized on lock type (uint8_t for WTF::Lock), the isHeldBit value (1 for
WTF::Lock), and the hasParkedBit value (2 for WTF::Lock). This could have been done as
a templatized Lock class that basically contains Atomic<LockType>. You could then make
any field into a lock by bitwise_casting it to TemplateLock<field type, bit1, bit2>. But
this felt too dirty, so instead, LockAlgorithm has static methods that take
Atomic<LockType>& as their first argument. I think that this makes it more natural to
project a LockAlgorithm onto an existing Atomic<> field. Sadly, some places have to cast
their non-Atomic<> field to Atomic<> in order for this to work. Like so many other things
we do, this just shows that the C++ style of labeling fields that are subject to atomic
ops as atomic is counterproductive. Maybe some day I'll change LockAlgorithm to use our
other Atomics API, which does not require Atomic<>.
WTF::Lock now uses LockAlgorithm. The slow paths are still outlined. I don't feel too
bad about the LockAlgorithm.h header being included in so many places because we change
that algorithm so infrequently.
Also, I added a hasElapsed(time) function. This function makes it so much more natural
to write timeslicing code, which the concurrent GC has to do a lot of.
* WTF.xcodeproj/project.pbxproj:
* wtf/CMakeLists.txt:
* wtf/ListDump.h:
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
(WTF::LockBase::unlockFairlySlow):
(WTF::LockBase::unlockSlowImpl): Deleted.
* wtf/Lock.h:
(WTF::LockBase::lock):
(WTF::LockBase::tryLock):
(WTF::LockBase::unlock):
(WTF::LockBase::unlockFairly):
(WTF::LockBase::isHeld):
(): Deleted.
* wtf/LockAlgorithm.h: Added.
(WTF::LockAlgorithm::lockFastAssumingZero):
(WTF::LockAlgorithm::lockFast):
(WTF::LockAlgorithm::lock):
(WTF::LockAlgorithm::tryLock):
(WTF::LockAlgorithm::unlockFastAssumingZero):
(WTF::LockAlgorithm::unlockFast):
(WTF::LockAlgorithm::unlock):
(WTF::LockAlgorithm::unlockFairly):
(WTF::LockAlgorithm::isLocked):
(WTF::LockAlgorithm::lockSlow):
(WTF::LockAlgorithm::unlockSlow):
* wtf/TimeWithDynamicClockType.cpp:
(WTF::hasElapsed):
* wtf/TimeWithDynamicClockType.h:
Canonical link: https://commits.webkit.org/182434@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@208720 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-11-15 01:49:22 +00:00
|
|
|
return DefaultLockAlgorithm::isLocked(m_byte);
|
Lightweight locks should be adaptive
https://bugs.webkit.org/show_bug.cgi?id=147545
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
* dfg/DFGCommon.cpp:
(JSC::DFG::startCrashing):
* heap/CopiedBlock.h:
(JSC::CopiedBlock::workListLock):
* heap/CopiedBlockInlines.h:
(JSC::CopiedBlock::shouldReportLiveBytes):
(JSC::CopiedBlock::reportLiveBytes):
* heap/CopiedSpace.cpp:
(JSC::CopiedSpace::doneFillingBlock):
* heap/CopiedSpace.h:
(JSC::CopiedSpace::CopiedGeneration::CopiedGeneration):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::recycleEvacuatedBlock):
* heap/GCThreadSharedData.cpp:
(JSC::GCThreadSharedData::didStartCopying):
* heap/GCThreadSharedData.h:
(JSC::GCThreadSharedData::getNextBlocksToCopy):
* heap/ListableHandler.h:
(JSC::ListableHandler::List::addThreadSafe):
(JSC::ListableHandler::List::addNotThreadSafe):
* heap/MachineStackMarker.cpp:
(JSC::MachineThreads::tryCopyOtherThreadStacks):
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::copyLater):
* parser/SourceProvider.cpp:
(JSC::SourceProvider::~SourceProvider):
(JSC::SourceProvider::getID):
* profiler/ProfilerDatabase.cpp:
(JSC::Profiler::Database::addDatabaseToAtExit):
(JSC::Profiler::Database::removeDatabaseFromAtExit):
(JSC::Profiler::Database::removeFirstAtExitDatabase):
* runtime/TypeProfilerLog.h:
Source/WebCore:
* bindings/objc/WebScriptObject.mm:
(WebCore::getJSWrapper):
(WebCore::addJSWrapper):
(WebCore::removeJSWrapper):
(WebCore::removeJSWrapperIfRetainCountOne):
* platform/audio/mac/CARingBuffer.cpp:
(WebCore::CARingBuffer::setCurrentFrameBounds):
(WebCore::CARingBuffer::getCurrentFrameBounds):
* platform/audio/mac/CARingBuffer.h:
* platform/ios/wak/WAKWindow.mm:
(-[WAKWindow setExposedScrollViewRect:]):
(-[WAKWindow exposedScrollViewRect]):
Source/WebKit2:
* WebProcess/WebPage/EventDispatcher.cpp:
(WebKit::EventDispatcher::clearQueuedTouchEventsForPage):
(WebKit::EventDispatcher::getQueuedTouchEventsForPage):
(WebKit::EventDispatcher::touchEvent):
(WebKit::EventDispatcher::dispatchTouchEvents):
* WebProcess/WebPage/EventDispatcher.h:
* WebProcess/WebPage/ViewUpdateDispatcher.cpp:
(WebKit::ViewUpdateDispatcher::visibleContentRectUpdate):
(WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate):
* WebProcess/WebPage/ViewUpdateDispatcher.h:
Source/WTF:
A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition
overhead is lower than system locks and because they take dramatically less space than system
locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock
acquire is up to 10x faster and under microcontention - short critical section with two or
more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4
bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit
should continue to avoid system locks - they are just far too slow and far too big.
But there is a problem with this idiom. System lock implementations will sleep a thread when
it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU.
In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for
microcontention, but awful when the lock will not be released for a while. In fact, when
critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is
almost 100x more than the CPU time cost of a system lock. This case doesn't arise too
frequently in our current uses of spinlocks, but that's probably because right now there are
places where we make a conscious decision to use system locks - even though they use more
memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a
while to acquire the lock.
The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new
concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks
that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The
idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held,
the slow path tries some number of spins to acquire the lock, and if that fails, the thread is
put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and
the lock itself is a tagged pointer: either it is just bits telling us the complete lock state
(not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the
lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path
if the lock is not contended, a short burst of spinning for microcontention, and a full-blown
queue for critical sections that are held for a long time.
On a locking microbenchmark, this new Lock exhibits the following performance
characteristics:
- Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster
than a system mutex.
- Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster
than a system mutex.
- CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a
SpinLock.
- Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than
a SpinLock.
This patch replaces all uses of SpinLock with Lock, since our critical sections are not
no-ops so if you do basically anything in your critical section, the Lock overhead will be
invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead
instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is
as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time.
This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits
of having a lock that just uses a byte are still better than the CPU wastage benefits of
Lock. But, this work will enable some future work to create locks that will fit in just 1.6
bits: https://bugs.webkit.org/show_bug.cgi?id=147665.
Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak
CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if
there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the
possibility of a null queue head.
[1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf
* WTF.vcxproj/WTF.vcxproj:
* WTF.xcodeproj/project.pbxproj:
* benchmarks: Added.
* benchmarks/LockSpeedTest.cpp: Added.
(main):
* wtf/Atomics.h:
(WTF::Atomic::compareExchangeWeak):
(WTF::Atomic::compareExchangeStrong):
* wtf/CMakeLists.txt:
* wtf/Lock.cpp: Added.
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
* wtf/Lock.h: Added.
(WTF::LockBase::lock):
(WTF::LockBase::unlock):
(WTF::LockBase::isHeld):
(WTF::LockBase::isLocked):
(WTF::Lock::Lock):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocatorHandle::shrink):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
(WTF::MetaAllocator::addFreshFreeSpace):
(WTF::MetaAllocator::debugFreeSpaceSize):
* wtf/MetaAllocator.h:
* wtf/SpinLock.h:
* wtf/ThreadingPthreads.cpp:
* wtf/ThreadingWin.cpp:
* wtf/text/AtomicString.cpp:
* wtf/text/AtomicStringImpl.cpp:
(WTF::AtomicStringTableLocker::AtomicStringTableLocker):
Tools:
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/Lock.cpp: Added.
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/165908@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
bool isLocked() const
|
|
|
|
{
|
|
|
|
return isHeld();
|
|
|
|
}
|
|
|
|
|
2017-08-06 04:43:37 +00:00
|
|
|
private:
|
WTF::Lock should not suffer from the thundering herd
https://bugs.webkit.org/show_bug.cgi?id=147947
Reviewed by Geoffrey Garen.
Source/WTF:
This changes Lock::unlockSlow() to use unparkOne() instead of unparkAll(). The problem with
doing this is that it's not obvious after calling unparkOne() if there are any other threads
that are still parked on the lock's queue. If we assume that there are and leave the
hasParkedBit set, then future calls to unlock() will take the slow path. We don't want that
if there aren't actually any threads parked. On the other hand, if we assume that there
aren't any threads parked and clear the hasParkedBit, then if there actually were some
threads parked, then they may never be awoken since future calls to unlock() won't take slow
path and so won't call unparkOne(). In other words, we need a way to be very precise about
when we clear the hasParkedBit and we need to do it in a race-free way: it can't be the case
that we clear the bit just as some thread gets parked on the queue.
A similar problem arises in futexes, and one of the solutions is to have a thread that
acquires a lock after parking sets the hasParkedBit. This is what Rusty Russel's usersem
does. It's a subtle algorithm. Also, it means that if a thread barges in before the unparked
thread runs, then that barging thread will not know that there are threads parked. This
could increase the severity of barging.
Since ParkingLot is a user-level API, we don't have to worry about the kernel-user security
issues and so we can expose callbacks while ParkingLot is holding its internal locks. This
change does exactly that for unparkOne(). The new variant of unparkOne() will call a user
function while the queue from which we are unparking is locked. The callback is told basic
stats about the queue: did we unpark a thread this time, and could there be more threads to
unpark in the future. The callback runs while it's impossible for the queue state to change,
since the ParkingLot's internal locks for the queue is held. This means that
Lock::unlockSlow() can either clear, or leave, the hasParkedBit while releasing the lock
inside the callback from unparkOne(). This takes care of the thundering herd problem while
also reducing the greed that arises from barging threads.
This required some careful reworking of the ParkingLot algorithm. The first thing I noticed
was that the ThreadData::shouldPark flag was useless, since it's set exactly when
ThreadData::address is non-null. Then I had to make sure that dequeue() could lazily create
both hashtables and buckets, since the "callback is called while queue is locked" invariant
requires that we didn't exit early due to the hashtable or bucket not being present. Note
that all of this is done in such a way that the old unparkOne() and unparkAll() don't have
to create any buckets, though they now may create the hashtable. We don't care as much about
the hashtable being created by unpark since it's just such an unlikely scenario and it would
only happen once.
This change reduces the kernel CPU usage of WTF::Lock for the long critical section test by
about 8x and makes it always perform as well as WTF::WordLock and WTF::Mutex for that
benchmark.
* benchmarks/LockSpeedTest.cpp:
* wtf/Lock.cpp:
(WTF::LockBase::unlockSlow):
* wtf/Lock.h:
(WTF::LockBase::isLocked):
(WTF::LockBase::isFullyReset):
* wtf/ParkingLot.cpp:
(WTF::ParkingLot::parkConditionally):
(WTF::ParkingLot::unparkOne):
(WTF::ParkingLot::unparkAll):
* wtf/ParkingLot.h:
* wtf/WordLock.h:
(WTF::WordLock::isLocked):
(WTF::WordLock::isFullyReset):
Tools:
Add testing that checks that locks return to a pristine state after contention is over.
* TestWebKitAPI/Tests/WTF/Lock.cpp:
(TestWebKitAPI::LockInspector::isFullyReset):
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/166072@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-13 03:51:25 +00:00
|
|
|
friend struct TestWebKitAPI::LockInspector;
|
|
|
|
|
2019-09-18 00:36:19 +00:00
|
|
|
static constexpr uint8_t isHeldBit = 1;
|
|
|
|
static constexpr uint8_t hasParkedBit = 2;
|
The GC should be optionally concurrent and disabled by default
https://bugs.webkit.org/show_bug.cgi?id=164454
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
This started out as a patch to have the GC scan the stack at the end, and then the
outage happened and I decided to pick a more aggresive target: give the GC a concurrent
mode that can be enabled at runtime, and whose only effect is that it turns on the
ResumeTheWorldScope. This gives our GC a really intuitive workflow: by default, the GC
thread is running solo with the world stopped and the parallel markers converged and
waiting. We have a parallel work scope to enable the parallel markers and now we have a
ResumeTheWorldScope that will optionally resume the world and then stop it again.
It's easy to make a concurrent GC that always instantly crashes. I can't promise that
this one won't do that when you run it. I set a specific goal: I wanted to do >10
concurrent GCs in debug mode with generations, optimizing JITs, and parallel marking
disabled.
To reach this milestone, I needed to do a bunch of stuff:
- The mutator needs a separate mark stack for the barrier, since it will mutate this
stack concurrently to the collector's slot visitors.
- The use of CellState to indicate whether an object is being scanned the first time or
a subsequent time was racy. It fails spectacularly when a barrier is fired at the same
time as visitChildren is running or if the barrier runs at the same time as the GC
marks the same object. So, I split SlotVisitor's mark stacks. It's now the case that
you know why you're being scanned by looking at which stack you came off of.
- All of root marking must be in the collector fixpoint. I renamed markRoots to
markToFixpoint. They say concurrency is hard, but the collector looks more intuitive
this way. We never gained anything from forcing people to make a choice between
scanning something in the fixpoint versus outside of it. Because root scanning is
cheap, we can afford to do it repeatedly, which means all root scanning can now do
constraint-based marking (like: I'll mark you if that thing is marked).
- JSObject::visitChildren's scanning of the butterfly raced with property additions,
indexed storage transitions and resizing, and a bunch of miscellaneous dirty butterfly
reshaping functions - like the one that flattens a dictionary and some sneaky
ArrayStorage transformations. Many of these can be fixed by using store-store fences
in the mutator and load-load fences in the collector. I've adopted the rule that the
collector must always see either a butterfly and structure that match or a newer
butterfly with an older structure, where their age is just one transition apart. This
can be achieved with fences. For the cases where it breaks down, I added a lock to
every JSCell. This is a full-fledged WTF lock that we sneak into two available bits in
the indexingType. See the WTF ChangeLog for details.
The mutator fencing rules are as follows:
- Store-store fence before and after setting the butterfly.
- Store-store fence before setting structure if you had changed the shape of the
butterfly.
- Store-store fence after initializing all fields in an allocation.
- A dictionary Structure can change in strange ways while the GC is trying to scan it.
So, JSObject::visitChildren will now grab the object's structure's lock if the
object's structure is a dictionary. Dictionary structures are 1:1 with their object,
so this does not reduce GC parallelism (super unlikely that the GC will simultaneously
scan an object from two threads).
- The GC can blow away a Structure's property table at any time. As a small consolation,
it's now holding the Structure's lock when it does so. But there was tons of code in
Structure that uses DeferGC to prevent the GC from blowing away the property table.
This doesn't work with concurrent GC, since DeferGC only means that the GC won't run
its safepoint (i.e. stop-the-world code) in the DeferGC region. It will still do
marking and it was the Structure::visitChildren that would delete the table. It turns
out that Structure's reliance on the property table not being deleted was the product
of code rot. We already had functions that would materialize the table on demand. We
were simply making the mistake of saying:
structure->materializePropertyMap();
...
structure->propertyTable()->things
Instead of saying:
PropertyTable* table = structure->ensurePropertyTable();
...
table->things
Switching the code to use the latter idiom allowed me to simplify the code a lot while
fixing the race.
- The LLInt's get_by_val handling was broken because the indexing shape constants were
wrong. Once I started putting more things into the IndexingType, that started causing
crashes for me. So I fixed LLInt. That turned out to be a lot of work, since that code
had rotted in subtle ways.
This is a speed-up in SunSpider, probably because of the LLInt fix. This is neutral on
Octane and Kraken. It's a smaller slow-down on LongSpider, but I think we can ignore
that (we don't view LongSpider as an official benchmark). By default, the concurrent GC
is disabled: in all of the places where it would have resumed the world to run marking
concurrently to the mutator, it will just skip the resume step. When you enable
concurrent GC (--useConcurrentGC=true), it can sometimes run Octane/splay to completion.
It seems to perform quite well: on my machine, it improves both splay-throughput and
splay-latency. It's probably unstable for other programs.
* API/JSVirtualMachine.mm:
(-[JSVirtualMachine isOldExternalObject:]):
* assembler/MacroAssemblerARMv7.h:
(JSC::MacroAssemblerARMv7::storeFence):
* bytecode/InlineAccess.cpp:
(JSC::InlineAccess::dumpCacheSizesAndCrash):
(JSC::InlineAccess::generateSelfPropertyAccess):
(JSC::InlineAccess::generateArrayLength):
* bytecode/ObjectAllocationProfile.h:
(JSC::ObjectAllocationProfile::offsetOfInlineCapacity):
(JSC::ObjectAllocationProfile::ObjectAllocationProfile):
(JSC::ObjectAllocationProfile::initialize):
(JSC::ObjectAllocationProfile::inlineCapacity):
(JSC::ObjectAllocationProfile::clear):
* bytecode/PolymorphicAccess.cpp:
(JSC::AccessCase::generateWithGuard):
(JSC::AccessCase::generateImpl):
* dfg/DFGArrayifySlowPathGenerator.h:
* dfg/DFGClobberize.h:
(JSC::DFG::clobberize):
* dfg/DFGOSRExitCompiler32_64.cpp:
(JSC::DFG::OSRExitCompiler::compileExit):
* dfg/DFGOSRExitCompiler64.cpp:
(JSC::DFG::OSRExitCompiler::compileExit):
* dfg/DFGOperations.cpp:
* dfg/DFGPlan.cpp:
(JSC::DFG::Plan::markCodeBlocks):
(JSC::DFG::Plan::rememberCodeBlocks):
* dfg/DFGPlan.h:
* dfg/DFGSpeculativeJIT.cpp:
(JSC::DFG::SpeculativeJIT::emitAllocateRawObject):
(JSC::DFG::SpeculativeJIT::checkArray):
(JSC::DFG::SpeculativeJIT::arrayify):
(JSC::DFG::SpeculativeJIT::compileMakeRope):
(JSC::DFG::SpeculativeJIT::compileNewFunctionCommon):
(JSC::DFG::SpeculativeJIT::compileCreateActivation):
(JSC::DFG::SpeculativeJIT::compileCreateDirectArguments):
(JSC::DFG::SpeculativeJIT::compileSpread):
(JSC::DFG::SpeculativeJIT::compileAllocatePropertyStorage):
(JSC::DFG::SpeculativeJIT::compileReallocatePropertyStorage):
(JSC::DFG::SpeculativeJIT::compileNewStringObject):
(JSC::DFG::SpeculativeJIT::compileNewTypedArray):
(JSC::DFG::SpeculativeJIT::compileStoreBarrier):
* dfg/DFGSpeculativeJIT64.cpp:
(JSC::DFG::SpeculativeJIT::compile):
(JSC::DFG::SpeculativeJIT::compileAllocateNewArrayWithSize):
* dfg/DFGTierUpCheckInjectionPhase.cpp:
(JSC::DFG::TierUpCheckInjectionPhase::run):
* dfg/DFGWorklist.cpp:
(JSC::DFG::Worklist::markCodeBlocks):
(JSC::DFG::Worklist::rememberCodeBlocks):
(JSC::DFG::markCodeBlocks):
(JSC::DFG::completeAllPlansForVM):
(JSC::DFG::rememberCodeBlocks):
* dfg/DFGWorklist.h:
* ftl/FTLAbstractHeapRepository.cpp:
(JSC::FTL::AbstractHeapRepository::AbstractHeapRepository):
(JSC::FTL::AbstractHeapRepository::computeRangesAndDecorateInstructions):
* ftl/FTLAbstractHeapRepository.h:
* ftl/FTLJITCode.cpp:
(JSC::FTL::JITCode::~JITCode):
* ftl/FTLLowerDFGToB3.cpp:
(JSC::FTL::DFG::LowerDFGToB3::compilePutStructure):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateActivation):
(JSC::FTL::DFG::LowerDFGToB3::compileNewFunction):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateDirectArguments):
(JSC::FTL::DFG::LowerDFGToB3::compileCreateRest):
(JSC::FTL::DFG::LowerDFGToB3::compileNewObject):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArray):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSpread):
(JSC::FTL::DFG::LowerDFGToB3::compileSpread):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayBuffer):
(JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSize):
(JSC::FTL::DFG::LowerDFGToB3::compileNewTypedArray):
(JSC::FTL::DFG::LowerDFGToB3::compileMakeRope):
(JSC::FTL::DFG::LowerDFGToB3::compileMultiPutByOffset):
(JSC::FTL::DFG::LowerDFGToB3::compileMaterializeNewObject):
(JSC::FTL::DFG::LowerDFGToB3::compileMaterializeCreateActivation):
(JSC::FTL::DFG::LowerDFGToB3::splatWords):
(JSC::FTL::DFG::LowerDFGToB3::allocatePropertyStorage):
(JSC::FTL::DFG::LowerDFGToB3::reallocatePropertyStorage):
(JSC::FTL::DFG::LowerDFGToB3::allocateObject):
(JSC::FTL::DFG::LowerDFGToB3::isArrayType):
(JSC::FTL::DFG::LowerDFGToB3::emitStoreBarrier):
(JSC::FTL::DFG::LowerDFGToB3::mutatorFence):
(JSC::FTL::DFG::LowerDFGToB3::setButterfly):
* ftl/FTLOSRExitCompiler.cpp:
(JSC::FTL::compileStub):
* ftl/FTLOutput.cpp:
(JSC::FTL::Output::signExt32ToPtr):
(JSC::FTL::Output::fence):
* ftl/FTLOutput.h:
* heap/CellState.h:
* heap/GCSegmentedArray.h:
* heap/Heap.cpp:
(JSC::Heap::ResumeTheWorldScope::ResumeTheWorldScope):
(JSC::Heap::ResumeTheWorldScope::~ResumeTheWorldScope):
(JSC::Heap::Heap):
(JSC::Heap::~Heap):
(JSC::Heap::harvestWeakReferences):
(JSC::Heap::finalizeUnconditionalFinalizers):
(JSC::Heap::completeAllJITPlans):
(JSC::Heap::markToFixpoint):
(JSC::Heap::gatherStackRoots):
(JSC::Heap::beginMarking):
(JSC::Heap::visitConservativeRoots):
(JSC::Heap::visitCompilerWorklistWeakReferences):
(JSC::Heap::updateObjectCounts):
(JSC::Heap::endMarking):
(JSC::Heap::addToRememberedSet):
(JSC::Heap::collectInThread):
(JSC::Heap::stopTheWorld):
(JSC::Heap::resumeTheWorld):
(JSC::Heap::setGCDidJIT):
(JSC::Heap::setNeedFinalize):
(JSC::Heap::setMutatorWaiting):
(JSC::Heap::clearMutatorWaiting):
(JSC::Heap::finalize):
(JSC::Heap::flushWriteBarrierBuffer):
(JSC::Heap::writeBarrierSlowPath):
(JSC::Heap::canCollect):
(JSC::Heap::reportExtraMemoryVisited):
(JSC::Heap::reportExternalMemoryVisited):
(JSC::Heap::notifyIsSafeToCollect):
(JSC::Heap::markRoots): Deleted.
(JSC::Heap::visitExternalRememberedSet): Deleted.
(JSC::Heap::visitSmallStrings): Deleted.
(JSC::Heap::visitProtectedObjects): Deleted.
(JSC::Heap::visitArgumentBuffers): Deleted.
(JSC::Heap::visitException): Deleted.
(JSC::Heap::visitStrongHandles): Deleted.
(JSC::Heap::visitHandleStack): Deleted.
(JSC::Heap::visitSamplingProfiler): Deleted.
(JSC::Heap::visitTypeProfiler): Deleted.
(JSC::Heap::visitShadowChicken): Deleted.
(JSC::Heap::traceCodeBlocksAndJITStubRoutines): Deleted.
(JSC::Heap::visitWeakHandles): Deleted.
(JSC::Heap::flushOldStructureIDTables): Deleted.
(JSC::Heap::stopAllocation): Deleted.
* heap/Heap.h:
(JSC::Heap::collectorSlotVisitor):
(JSC::Heap::mutatorMarkStack):
(JSC::Heap::mutatorShouldBeFenced):
(JSC::Heap::addressOfMutatorShouldBeFenced):
(JSC::Heap::slotVisitor): Deleted.
(JSC::Heap::notifyIsSafeToCollect): Deleted.
(JSC::Heap::barrierShouldBeFenced): Deleted.
(JSC::Heap::addressOfBarrierShouldBeFenced): Deleted.
* heap/MarkStack.cpp:
(JSC::MarkStackArray::transferTo):
* heap/MarkStack.h:
* heap/MarkedAllocator.cpp:
(JSC::MarkedAllocator::tryAllocateIn):
* heap/MarkedBlock.cpp:
(JSC::MarkedBlock::MarkedBlock):
(JSC::MarkedBlock::Handle::specializedSweep):
(JSC::MarkedBlock::Handle::sweep):
(JSC::MarkedBlock::Handle::sweepHelperSelectMarksMode):
(JSC::MarkedBlock::Handle::stopAllocating):
(JSC::MarkedBlock::Handle::resumeAllocating):
(JSC::MarkedBlock::aboutToMarkSlow):
(JSC::MarkedBlock::Handle::didConsumeFreeList):
(JSC::SetNewlyAllocatedFunctor::SetNewlyAllocatedFunctor): Deleted.
(JSC::SetNewlyAllocatedFunctor::operator()): Deleted.
* heap/MarkedBlock.h:
* heap/MarkedSpace.cpp:
(JSC::MarkedSpace::resumeAllocating):
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::SlotVisitor):
(JSC::SlotVisitor::~SlotVisitor):
(JSC::SlotVisitor::reset):
(JSC::SlotVisitor::clearMarkStacks):
(JSC::SlotVisitor::appendJSCellOrAuxiliary):
(JSC::SlotVisitor::setMarkedAndAppendToMarkStack):
(JSC::SlotVisitor::appendToMarkStack):
(JSC::SlotVisitor::appendToMutatorMarkStack):
(JSC::SlotVisitor::visitChildren):
(JSC::SlotVisitor::donateKnownParallel):
(JSC::SlotVisitor::drain):
(JSC::SlotVisitor::drainFromShared):
(JSC::SlotVisitor::containsOpaqueRoot):
(JSC::SlotVisitor::donateAndDrain):
(JSC::SlotVisitor::mergeOpaqueRoots):
(JSC::SlotVisitor::dump):
(JSC::SlotVisitor::clearMarkStack): Deleted.
(JSC::SlotVisitor::opaqueRootCount): Deleted.
* heap/SlotVisitor.h:
(JSC::SlotVisitor::collectorMarkStack):
(JSC::SlotVisitor::mutatorMarkStack):
(JSC::SlotVisitor::isEmpty):
(JSC::SlotVisitor::bytesVisited):
(JSC::SlotVisitor::markStack): Deleted.
(JSC::SlotVisitor::bytesCopied): Deleted.
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::reportExtraMemoryVisited):
(JSC::SlotVisitor::reportExternalMemoryVisited):
* jit/AssemblyHelpers.cpp:
(JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo):
* jit/AssemblyHelpers.h:
(JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo):
(JSC::AssemblyHelpers::barrierStoreLoadFence):
(JSC::AssemblyHelpers::mutatorFence):
(JSC::AssemblyHelpers::storeButterfly):
(JSC::AssemblyHelpers::jumpIfMutatorFenceNotNeeded):
(JSC::AssemblyHelpers::emitInitializeInlineStorage):
(JSC::AssemblyHelpers::emitInitializeOutOfLineStorage):
(JSC::AssemblyHelpers::jumpIfBarrierStoreLoadFenceNotNeeded): Deleted.
* jit/JITInlines.h:
(JSC::JIT::emitArrayProfilingSiteWithCell):
* jit/JITOperations.cpp:
* jit/JITPropertyAccess.cpp:
(JSC::JIT::emit_op_put_to_scope):
(JSC::JIT::emit_op_put_to_arguments):
* llint/LLIntData.cpp:
(JSC::LLInt::Data::performAssertions):
* llint/LowLevelInterpreter.asm:
* llint/LowLevelInterpreter64.asm:
* runtime/ButterflyInlines.h:
(JSC::Butterfly::create):
(JSC::Butterfly::createOrGrowPropertyStorage):
* runtime/ConcurrentJITLock.h:
(JSC::GCSafeConcurrentJITLocker::NoDefer::NoDefer): Deleted.
* runtime/GenericArgumentsInlines.h:
(JSC::GenericArguments<Type>::getOwnPropertySlotByIndex):
(JSC::GenericArguments<Type>::putByIndex):
* runtime/IndexingType.h:
* runtime/JSArray.cpp:
(JSC::JSArray::unshiftCountSlowCase):
(JSC::JSArray::unshiftCountWithArrayStorage):
* runtime/JSCell.h:
(JSC::JSCell::InternalLocker::InternalLocker):
(JSC::JSCell::InternalLocker::~InternalLocker):
(JSC::JSCell::atomicCompareExchangeCellStateWeakRelaxed):
(JSC::JSCell::atomicCompareExchangeCellStateStrong):
(JSC::JSCell::indexingTypeAndMiscOffset):
(JSC::JSCell::indexingTypeOffset): Deleted.
* runtime/JSCellInlines.h:
(JSC::JSCell::JSCell):
(JSC::JSCell::finishCreation):
(JSC::JSCell::indexingTypeAndMisc):
(JSC::JSCell::indexingType):
(JSC::JSCell::setStructure):
(JSC::JSCell::callDestructor):
(JSC::JSCell::lockInternalLock):
(JSC::JSCell::unlockInternalLock):
* runtime/JSObject.cpp:
(JSC::JSObject::visitButterfly):
(JSC::JSObject::visitChildren):
(JSC::JSFinalObject::visitChildren):
(JSC::JSObject::enterDictionaryIndexingModeWhenArrayStorageAlreadyExists):
(JSC::JSObject::createInitialUndecided):
(JSC::JSObject::createInitialInt32):
(JSC::JSObject::createInitialDouble):
(JSC::JSObject::createInitialContiguous):
(JSC::JSObject::createArrayStorage):
(JSC::JSObject::convertUndecidedToArrayStorage):
(JSC::JSObject::convertInt32ToArrayStorage):
(JSC::JSObject::convertDoubleToArrayStorage):
(JSC::JSObject::convertContiguousToArrayStorage):
(JSC::JSObject::deleteProperty):
(JSC::JSObject::defineOwnIndexedProperty):
(JSC::JSObject::increaseVectorLength):
(JSC::JSObject::ensureLengthSlow):
(JSC::JSObject::reallocateAndShrinkButterfly):
(JSC::JSObject::allocateMoreOutOfLineStorage):
(JSC::JSObject::shiftButterflyAfterFlattening):
(JSC::JSObject::growOutOfLineStorage): Deleted.
* runtime/JSObject.h:
(JSC::JSFinalObject::JSFinalObject):
(JSC::JSObject::setButterfly):
(JSC::JSObject::getOwnNonIndexPropertySlot):
(JSC::JSObject::fillCustomGetterPropertySlot):
(JSC::JSObject::getOwnPropertySlot):
(JSC::JSObject::getPropertySlot):
(JSC::JSObject::setStructureAndButterfly): Deleted.
(JSC::JSObject::setButterflyWithoutChangingStructure): Deleted.
(JSC::JSObject::putDirectInternal): Deleted.
(JSC::JSObject::putDirectWithoutTransition): Deleted.
* runtime/JSObjectInlines.h:
(JSC::JSObject::getPropertySlot):
(JSC::JSObject::getNonIndexPropertySlot):
(JSC::JSObject::putDirectWithoutTransition):
(JSC::JSObject::putDirectInternal):
* runtime/Options.h:
* runtime/SparseArrayValueMap.h:
* runtime/Structure.cpp:
(JSC::Structure::dumpStatistics):
(JSC::Structure::findStructuresAndMapForMaterialization):
(JSC::Structure::materializePropertyTable):
(JSC::Structure::addNewPropertyTransition):
(JSC::Structure::changePrototypeTransition):
(JSC::Structure::attributeChangeTransition):
(JSC::Structure::toDictionaryTransition):
(JSC::Structure::takePropertyTableOrCloneIfPinned):
(JSC::Structure::nonPropertyTransition):
(JSC::Structure::isSealed):
(JSC::Structure::isFrozen):
(JSC::Structure::flattenDictionaryStructure):
(JSC::Structure::pin):
(JSC::Structure::pinForCaching):
(JSC::Structure::willStoreValueSlow):
(JSC::Structure::copyPropertyTableForPinning):
(JSC::Structure::add):
(JSC::Structure::remove):
(JSC::Structure::getPropertyNamesFromStructure):
(JSC::Structure::visitChildren):
(JSC::Structure::materializePropertyMap): Deleted.
(JSC::Structure::addPropertyWithoutTransition): Deleted.
(JSC::Structure::removePropertyWithoutTransition): Deleted.
(JSC::Structure::copyPropertyTable): Deleted.
(JSC::Structure::createPropertyMap): Deleted.
(JSC::PropertyTable::checkConsistency): Deleted.
(JSC::Structure::checkConsistency): Deleted.
* runtime/Structure.h:
* runtime/StructureIDBlob.h:
(JSC::StructureIDBlob::StructureIDBlob):
(JSC::StructureIDBlob::indexingTypeIncludingHistory):
(JSC::StructureIDBlob::setIndexingTypeIncludingHistory):
(JSC::StructureIDBlob::indexingTypeIncludingHistoryOffset):
(JSC::StructureIDBlob::indexingType): Deleted.
(JSC::StructureIDBlob::setIndexingType): Deleted.
(JSC::StructureIDBlob::indexingTypeOffset): Deleted.
* runtime/StructureInlines.h:
(JSC::Structure::get):
(JSC::Structure::checkOffsetConsistency):
(JSC::Structure::checkConsistency):
(JSC::Structure::add):
(JSC::Structure::remove):
(JSC::Structure::addPropertyWithoutTransition):
(JSC::Structure::removePropertyWithoutTransition):
(JSC::Structure::setPropertyTable):
(JSC::Structure::putWillGrowOutOfLineStorage): Deleted.
(JSC::Structure::propertyTable): Deleted.
(JSC::Structure::suggestedNewOutOfLineStorageCapacity): Deleted.
Source/WTF:
The reason why I went to such great pains to make WTF::Lock fit in two bits is that I
knew that I would eventually need to stuff one into some miscellaneous bits of the
JSCell header. That time has come, because the concurrent GC has numerous race
conditions in visitChildren that can be trivially fixed if each object just has an
internal lock. Some cell types might use it to simply protect their entire visitChildren
function and anything that mutates the fields it touches, while other cell types might
use it as a "lock of last resort" to handle corner cases of an otherwise wait-free or
lock-free algorithm. Right now, it's used to protect certain transformations involving
indexing storage.
To make this happen, I factored the WTF::Lock algorithm into a LockAlgorithm struct that
is templatized on lock type (uint8_t for WTF::Lock), the isHeldBit value (1 for
WTF::Lock), and the hasParkedBit value (2 for WTF::Lock). This could have been done as
a templatized Lock class that basically contains Atomic<LockType>. You could then make
any field into a lock by bitwise_casting it to TemplateLock<field type, bit1, bit2>. But
this felt too dirty, so instead, LockAlgorithm has static methods that take
Atomic<LockType>& as their first argument. I think that this makes it more natural to
project a LockAlgorithm onto an existing Atomic<> field. Sadly, some places have to cast
their non-Atomic<> field to Atomic<> in order for this to work. Like so many other things
we do, this just shows that the C++ style of labeling fields that are subject to atomic
ops as atomic is counterproductive. Maybe some day I'll change LockAlgorithm to use our
other Atomics API, which does not require Atomic<>.
WTF::Lock now uses LockAlgorithm. The slow paths are still outlined. I don't feel too
bad about the LockAlgorithm.h header being included in so many places because we change
that algorithm so infrequently.
Also, I added a hasElapsed(time) function. This function makes it so much more natural
to write timeslicing code, which the concurrent GC has to do a lot of.
* WTF.xcodeproj/project.pbxproj:
* wtf/CMakeLists.txt:
* wtf/ListDump.h:
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
(WTF::LockBase::unlockFairlySlow):
(WTF::LockBase::unlockSlowImpl): Deleted.
* wtf/Lock.h:
(WTF::LockBase::lock):
(WTF::LockBase::tryLock):
(WTF::LockBase::unlock):
(WTF::LockBase::unlockFairly):
(WTF::LockBase::isHeld):
(): Deleted.
* wtf/LockAlgorithm.h: Added.
(WTF::LockAlgorithm::lockFastAssumingZero):
(WTF::LockAlgorithm::lockFast):
(WTF::LockAlgorithm::lock):
(WTF::LockAlgorithm::tryLock):
(WTF::LockAlgorithm::unlockFastAssumingZero):
(WTF::LockAlgorithm::unlockFast):
(WTF::LockAlgorithm::unlock):
(WTF::LockAlgorithm::unlockFairly):
(WTF::LockAlgorithm::isLocked):
(WTF::LockAlgorithm::lockSlow):
(WTF::LockAlgorithm::unlockSlow):
* wtf/TimeWithDynamicClockType.cpp:
(WTF::hasElapsed):
* wtf/TimeWithDynamicClockType.h:
Canonical link: https://commits.webkit.org/182434@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@208720 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-11-15 01:49:22 +00:00
|
|
|
|
Lightweight locks should be adaptive
https://bugs.webkit.org/show_bug.cgi?id=147545
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
* dfg/DFGCommon.cpp:
(JSC::DFG::startCrashing):
* heap/CopiedBlock.h:
(JSC::CopiedBlock::workListLock):
* heap/CopiedBlockInlines.h:
(JSC::CopiedBlock::shouldReportLiveBytes):
(JSC::CopiedBlock::reportLiveBytes):
* heap/CopiedSpace.cpp:
(JSC::CopiedSpace::doneFillingBlock):
* heap/CopiedSpace.h:
(JSC::CopiedSpace::CopiedGeneration::CopiedGeneration):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::recycleEvacuatedBlock):
* heap/GCThreadSharedData.cpp:
(JSC::GCThreadSharedData::didStartCopying):
* heap/GCThreadSharedData.h:
(JSC::GCThreadSharedData::getNextBlocksToCopy):
* heap/ListableHandler.h:
(JSC::ListableHandler::List::addThreadSafe):
(JSC::ListableHandler::List::addNotThreadSafe):
* heap/MachineStackMarker.cpp:
(JSC::MachineThreads::tryCopyOtherThreadStacks):
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::copyLater):
* parser/SourceProvider.cpp:
(JSC::SourceProvider::~SourceProvider):
(JSC::SourceProvider::getID):
* profiler/ProfilerDatabase.cpp:
(JSC::Profiler::Database::addDatabaseToAtExit):
(JSC::Profiler::Database::removeDatabaseFromAtExit):
(JSC::Profiler::Database::removeFirstAtExitDatabase):
* runtime/TypeProfilerLog.h:
Source/WebCore:
* bindings/objc/WebScriptObject.mm:
(WebCore::getJSWrapper):
(WebCore::addJSWrapper):
(WebCore::removeJSWrapper):
(WebCore::removeJSWrapperIfRetainCountOne):
* platform/audio/mac/CARingBuffer.cpp:
(WebCore::CARingBuffer::setCurrentFrameBounds):
(WebCore::CARingBuffer::getCurrentFrameBounds):
* platform/audio/mac/CARingBuffer.h:
* platform/ios/wak/WAKWindow.mm:
(-[WAKWindow setExposedScrollViewRect:]):
(-[WAKWindow exposedScrollViewRect]):
Source/WebKit2:
* WebProcess/WebPage/EventDispatcher.cpp:
(WebKit::EventDispatcher::clearQueuedTouchEventsForPage):
(WebKit::EventDispatcher::getQueuedTouchEventsForPage):
(WebKit::EventDispatcher::touchEvent):
(WebKit::EventDispatcher::dispatchTouchEvents):
* WebProcess/WebPage/EventDispatcher.h:
* WebProcess/WebPage/ViewUpdateDispatcher.cpp:
(WebKit::ViewUpdateDispatcher::visibleContentRectUpdate):
(WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate):
* WebProcess/WebPage/ViewUpdateDispatcher.h:
Source/WTF:
A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition
overhead is lower than system locks and because they take dramatically less space than system
locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock
acquire is up to 10x faster and under microcontention - short critical section with two or
more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4
bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit
should continue to avoid system locks - they are just far too slow and far too big.
But there is a problem with this idiom. System lock implementations will sleep a thread when
it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU.
In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for
microcontention, but awful when the lock will not be released for a while. In fact, when
critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is
almost 100x more than the CPU time cost of a system lock. This case doesn't arise too
frequently in our current uses of spinlocks, but that's probably because right now there are
places where we make a conscious decision to use system locks - even though they use more
memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a
while to acquire the lock.
The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new
concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks
that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The
idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held,
the slow path tries some number of spins to acquire the lock, and if that fails, the thread is
put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and
the lock itself is a tagged pointer: either it is just bits telling us the complete lock state
(not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the
lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path
if the lock is not contended, a short burst of spinning for microcontention, and a full-blown
queue for critical sections that are held for a long time.
On a locking microbenchmark, this new Lock exhibits the following performance
characteristics:
- Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster
than a system mutex.
- Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster
than a system mutex.
- CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a
SpinLock.
- Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than
a SpinLock.
This patch replaces all uses of SpinLock with Lock, since our critical sections are not
no-ops so if you do basically anything in your critical section, the Lock overhead will be
invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead
instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is
as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time.
This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits
of having a lock that just uses a byte are still better than the CPU wastage benefits of
Lock. But, this work will enable some future work to create locks that will fit in just 1.6
bits: https://bugs.webkit.org/show_bug.cgi?id=147665.
Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak
CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if
there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the
possibility of a null queue head.
[1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf
* WTF.vcxproj/WTF.vcxproj:
* WTF.xcodeproj/project.pbxproj:
* benchmarks: Added.
* benchmarks/LockSpeedTest.cpp: Added.
(main):
* wtf/Atomics.h:
(WTF::Atomic::compareExchangeWeak):
(WTF::Atomic::compareExchangeStrong):
* wtf/CMakeLists.txt:
* wtf/Lock.cpp: Added.
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
* wtf/Lock.h: Added.
(WTF::LockBase::lock):
(WTF::LockBase::unlock):
(WTF::LockBase::isHeld):
(WTF::LockBase::isLocked):
(WTF::Lock::Lock):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocatorHandle::shrink):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
(WTF::MetaAllocator::addFreshFreeSpace):
(WTF::MetaAllocator::debugFreeSpaceSize):
* wtf/MetaAllocator.h:
* wtf/SpinLock.h:
* wtf/ThreadingPthreads.cpp:
* wtf/ThreadingWin.cpp:
* wtf/text/AtomicString.cpp:
* wtf/text/AtomicStringImpl.cpp:
(WTF::AtomicStringTableLocker::AtomicStringTableLocker):
Tools:
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/Lock.cpp: Added.
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/165908@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
|
|
|
WTF_EXPORT_PRIVATE void lockSlow();
|
|
|
|
WTF_EXPORT_PRIVATE void unlockSlow();
|
WTF::Lock should be fair eventually
https://bugs.webkit.org/show_bug.cgi?id=159384
Reviewed by Geoffrey Garen.
Source/WTF:
In https://webkit.org/blog/6161/locking-in-webkit/ we showed how relaxing the fairness of
locks makes them fast. That post presented lock fairness as a trade-off between two
extremes:
- Barging. A barging lock, like WTF::Lock, releases the lock in unlock() even if there was a
thread on the queue. If there was a thread on the queue, the lock is released and that
thread is made runnable. That thread may then grab the lock, or some other thread may grab
the lock first (it may barge). Usually, the barging thread is the thread that released the
lock in the first place. This maximizes throughput but hurts fairness. There is no good
theoretical bound on how unfair the lock may become, but empirical data suggests that it's
fair enough for the cases we previously measured.
- FIFO. A FIFO lock, like HandoffLock in ToyLocks.h, does not release the lock in unlock()
if there is a thread waiting. If there is a thread waiting, unlock() will make that thread
runnable and inform it that it now holds the lock. This ensures perfect round-robin
fairness and allows us to reason theoretically about how long it may take for a thread to
grab the lock. For example, if we know that only N threads are running and each one may
contend on a critical section, and each one may hold the lock for at most S seconds, then
the time it takes to grab the lock is N * S. Unfortunately, FIFO locks perform very badly
in most cases. This is because for the common case of short critical sections, they force
a context switch after each critical section if the lock is contended.
This change makes WTF::Lock almost as fair as FIFO while still being as fast as barging.
Thanks to this new algorithm, you can now have both of these things at the same time.
This change makes WTF::Lock eventually fair. We can almost (more on the caveats below)
guarantee that the time it takes to grab a lock is N * max(1ms, S). In other words, critical
sections that are longer than 1ms are always fair. For shorter critical sections, the amount
of time that any thread waits is 1ms times the number of threads. There are some caveats
that arise from our use of randomness, but even then, in the limit as the critical section
length goes to infinity, the lock becomes fair. The corner cases are unlikely to happen; our
experiments show that the lock becomes exactly as fair as a FIFO lock for any critical
section that is 1ms or longer.
The fairness mechanism is broken into two parts. WTF::Lock can now choose to unlock a lock
fairly or unfairly thanks to the new ParkingLot token mechanism. WTF::Lock knows when to use
fair unlocking based on a timeout mechanism in ParkingLot called timeToBeFair.
ParkingLot::unparkOne() and ParkingLot::parkConditionally() can now communicate with each
other via a token. unparkOne() can pass a token, which parkConditionally() will return. This
change also makes parkConditionally() a lot more precise about when it was unparked due to a
call to unparkOne(). If unparkOne() is told that a thread was unparked then this thread is
guaranteed to report that it was unparked rather than timing out, and that thread is
guaranteed to get the token that unparkOne() passed. The token is an intptr_t. We use it as
a boolean variable in WTF::Lock, but you could use it to pass arbitrary data structures. By
default, the token is zero. WTF::Lock's unlock() will pass 1 as the token if it is doing
fair unlocking. In that case, unlock() will not release the lock, and lock() will know that
it holds the lock as soon as parkConditionally() returns. Note that this algorithm relies
on unparkOne() invoking WTF::Lock's callback while the queue lock is held, so that WTF::Lock
can make a decision about unlock strategy and inject a token while it has complete knowledge
over the state of the queue. As such, it's not immediately obvious how to implement this
algorithm on top of futexes. You really need ParkingLot!
WTF::Lock does not use fair unlocking every time. We expose a new API, Lock::unlockFairly(),
which forces the fair unlocking behavior. Additionally, ParkingLot now maintains a
per-bucket stochastic fairness timeout. When the timeout fires, the unparkOne() callback
sees UnparkResult::timeToBeFair = true. This timeout is set to be anywhere from 0ms to 1ms
at random. When a dequeue happens and there are threads that actually get dequeued, we check
if the time since the last unfair unlock (the last time timeToBeFair was set to true) is
more than the timeout amount. If so, then we set timeToBeFair to true and reset the timeout.
This means that in the absence of ParkingLot collisions, unfair unlocking is guaranteed to
happen at least once per millisecond. It will happen at 2 KHz on average. If there are
collisions, then each collision adds one millisecond to the worst case (and 0.5 ms to the
average case). The reason why we don't just use a fixed 1ms timeout is that we want to avoid
resonance. Imagine a program in which some thread acquires a lock at 1 KHz in-phase with the
timeToBeFair timeout. Then this thread would be the benefactor of fairness to the detriment
of everyone else. Randomness ensures that we aren't too fair to any one thread.
Empirically, this is neutral on our major benchmarks like JetStream but it's an enormous
improvement in LockFairnessTest. It's common for an unfair lock (either our BargingLock, the
old WTF::Lock, any of the other futex-based locks that barge, or new os_unfair_lock) to
allow only one thread to hold the lock during a whole second in which each thread is holding
the lock for 1ms at a time. This is because in a barging lock, releasing a lock after
holding it for 1ms and then reacquiring it immediately virtually ensures that none of the
other threads can wake up in time to grab it before it's relocked. But the new WTF::Lock
handles this case like a champ: each thread gets equal turns.
Here's some data. If we launch 10 threads and have each of them run for 1 second while
repeatedly holding a critical section for 1ms, then here's how many times each thread gets
to hold the lock using the old WTF::Lock algorithm:
799, 6, 1, 1, 1, 1, 1, 1, 1, 1
One thread hogged the lock for almost the whole time! With the new WTF::Lock, the lock
becomes totally fair:
80, 79, 79, 79, 79, 79, 79, 80, 80, 79
I don't know of anyone creating such an automatically-fair adaptive lock before, so I think
that this is a pretty awesome advancement to the state of the art!
This change is good for three reasons:
- We do have long critical sections in WebKit and we don't want to have to worry about
starvation. This reduces the likelihood that we will see starvation due to our lock
strategy.
- I was talking to ggaren about bmalloc's locking needs, and he wanted unlockFairly() or
lockFairly() or some moral equivalent for the scavenger thread.
- If we use a WTF::Lock to manage heap access in a multithreaded GC, we'll need the ability
to unlock and relock without barging.
* benchmarks/LockFairnessTest.cpp:
(main):
* benchmarks/ToyLocks.h:
* wtf/Condition.h:
(WTF::ConditionBase::waitUntil):
(WTF::ConditionBase::notifyOne):
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
(WTF::LockBase::unlockFairlySlow):
(WTF::LockBase::unlockSlowImpl):
* wtf/Lock.h:
(WTF::LockBase::try_lock):
(WTF::LockBase::unlock):
(WTF::LockBase::unlockFairly):
(WTF::LockBase::isHeld):
(WTF::LockBase::isFullyReset):
* wtf/ParkingLot.cpp:
(WTF::ParkingLot::parkConditionallyImpl):
(WTF::ParkingLot::unparkOne):
(WTF::ParkingLot::unparkOneImpl):
(WTF::ParkingLot::unparkAll):
* wtf/ParkingLot.h:
(WTF::ParkingLot::parkConditionally):
(WTF::ParkingLot::compareAndPark):
(WTF::ParkingLot::unparkOne):
Tools:
* TestWebKitAPI/Tests/WTF/ParkingLot.cpp:
Canonical link: https://commits.webkit.org/178039@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@203350 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-07-18 18:32:52 +00:00
|
|
|
WTF_EXPORT_PRIVATE void unlockFairlySlow();
|
PerformanceTests:
Concurrent GC should be stable enough to land enabled
https://bugs.webkit.org/show_bug.cgi?id=164990
Reviewed by Geoffrey Garen.
Made CDjs more configurable and refined the "large.js" configuration. I was using that one and
the new "long.js" configuration to tune concurrent eden GCs.
Added a new way of running Splay in browser, which using chartjs to plot the execution times of
2000 iterations. This includes the minified chartjs.
* JetStream/Octane2/splay-detail.html: Added.
* JetStream/cdjs/benchmark.js:
(benchmarkImpl):
(benchmark):
* JetStream/cdjs/long.js: Added.
Source/JavaScriptCore:
Concurrent GC should be stable enough to land enabled on X86_64
https://bugs.webkit.org/show_bug.cgi?id=164990
Reviewed by Geoffrey Garen.
This fixes a ton of performance and correctness bugs revealed by getting the concurrent GC to
be stable enough to land enabled.
I had to redo the JSObject::visitChildren concurrency protocol again. This time I think it's
even more correct than ever!
This is an enormous win on JetStream/splay-latency and Octane/SplayLatency. It looks to be
mostly neutral on everything else, though Speedometer is showing statistically weak signs of a
slight regression.
* API/JSAPIWrapperObject.mm: Added locking.
(JSC::JSAPIWrapperObject::visitChildren):
* API/JSCallbackObject.h: Added locking.
(JSC::JSCallbackObjectData::visitChildren):
(JSC::JSCallbackObjectData::JSPrivatePropertyMap::setPrivateProperty):
(JSC::JSCallbackObjectData::JSPrivatePropertyMap::deletePrivateProperty):
(JSC::JSCallbackObjectData::JSPrivatePropertyMap::visitChildren):
* CMakeLists.txt:
* JavaScriptCore.xcodeproj/project.pbxproj:
* bytecode/CodeBlock.cpp:
(JSC::CodeBlock::UnconditionalFinalizer::finalizeUnconditionally): This had a TOCTOU race on shouldJettisonDueToOldAge.
(JSC::EvalCodeCache::visitAggregate): Moved to EvalCodeCache.cpp.
* bytecode/DirectEvalCodeCache.cpp: Added. Outlined some functions and made them use locks.
(JSC::DirectEvalCodeCache::setSlow):
(JSC::DirectEvalCodeCache::clear):
(JSC::DirectEvalCodeCache::visitAggregate):
* bytecode/DirectEvalCodeCache.h:
(JSC::DirectEvalCodeCache::set):
(JSC::DirectEvalCodeCache::clear): Deleted.
* bytecode/UnlinkedCodeBlock.cpp: Added locking.
(JSC::UnlinkedCodeBlock::visitChildren):
(JSC::UnlinkedCodeBlock::setInstructions):
(JSC::UnlinkedCodeBlock::shrinkToFit):
* bytecode/UnlinkedCodeBlock.h: Added locking.
(JSC::UnlinkedCodeBlock::addRegExp):
(JSC::UnlinkedCodeBlock::addConstant):
(JSC::UnlinkedCodeBlock::addFunctionDecl):
(JSC::UnlinkedCodeBlock::addFunctionExpr):
(JSC::UnlinkedCodeBlock::createRareDataIfNecessary):
(JSC::UnlinkedCodeBlock::shrinkToFit): Deleted.
* debugger/Debugger.cpp: Use the right delete API.
(JSC::Debugger::recompileAllJSFunctions):
* dfg/DFGAbstractInterpreterInlines.h:
(JSC::DFG::AbstractInterpreter<AbstractStateType>::executeEffects): Fix a pre-existing bug in ToFunction constant folding.
* dfg/DFGClobberize.h: Add support for nuking.
(JSC::DFG::clobberize):
* dfg/DFGClobbersExitState.cpp: Add support for nuking.
(JSC::DFG::clobbersExitState):
* dfg/DFGFixupPhase.cpp: Add support for nuking.
(JSC::DFG::FixupPhase::fixupNode):
(JSC::DFG::FixupPhase::indexForChecks):
(JSC::DFG::FixupPhase::originForCheck):
(JSC::DFG::FixupPhase::speculateForBarrier):
(JSC::DFG::FixupPhase::insertCheck):
(JSC::DFG::FixupPhase::fixupChecksInBlock):
* dfg/DFGSpeculativeJIT.cpp: Add support for nuking.
(JSC::DFG::SpeculativeJIT::compileAllocatePropertyStorage):
(JSC::DFG::SpeculativeJIT::compileReallocatePropertyStorage):
* ftl/FTLLowerDFGToB3.cpp: Add support for nuking.
(JSC::FTL::DFG::LowerDFGToB3::allocatePropertyStorage):
(JSC::FTL::DFG::LowerDFGToB3::reallocatePropertyStorage):
(JSC::FTL::DFG::LowerDFGToB3::mutatorFence):
(JSC::FTL::DFG::LowerDFGToB3::nukeStructureAndSetButterfly):
(JSC::FTL::DFG::LowerDFGToB3::setButterfly): Deleted.
* heap/CodeBlockSet.cpp: We need to be more careful about the CodeBlockSet workflow during GC, since we will allocate CodeBlocks in eden while collecting.
(JSC::CodeBlockSet::clearMarksForFullCollection):
(JSC::CodeBlockSet::deleteUnmarkedAndUnreferenced):
* heap/Heap.cpp: Added code to measure max pauses. Added a better collectContinuously mode.
(JSC::Heap::lastChanceToFinalize): Stop the collectContinuously thread.
(JSC::Heap::harvestWeakReferences): Inline SlotVisitor::harvestWeakReferences.
(JSC::Heap::finalizeUnconditionalFinalizers): Inline SlotVisitor::finalizeUnconditionalReferences.
(JSC::Heap::markToFixpoint): We need to do some MarkedSpace stuff before every conservative scan, rather than just at the start of marking, so we now call prepareForConservativeScan() before each conservative scan. Also call a less-parallel version of drainInParallel when the mutator is running.
(JSC::Heap::collectInThread): Inline Heap::prepareForAllocation().
(JSC::Heap::stopIfNecessarySlow): We need to be more careful about ensuring that we run finalization before and after stopping. Also, we should sanitize stack when stopping the world.
(JSC::Heap::acquireAccessSlow): Add some optional debug prints.
(JSC::Heap::handleNeedFinalize): Assert that we are running this when the world is not stopped.
(JSC::Heap::finalize): Remove the old collectContinuously code.
(JSC::Heap::requestCollection): We don't need to sanitize stack here anymore.
(JSC::Heap::notifyIsSafeToCollect): Start the collectContinuously thread. It will request collection 1 KHz.
(JSC::Heap::prepareForAllocation): Deleted.
(JSC::Heap::preventCollection): Prevent any new concurrent GCs from being initiated.
(JSC::Heap::allowCollection):
(JSC::Heap::forEachSlotVisitor): Allows us to safely iterate slot visitors.
* heap/Heap.h:
* heap/HeapInlines.h:
(JSC::Heap::writeBarrier): If the 'to' cell is not NewWhite then it could be AnthraciteOrBlack. During a full collection, objects may be AnthraciteOrBlack from a previous GC. Turns out, we don't benefit from this optimization so we can just kill it.
* heap/HeapSnapshotBuilder.cpp:
(JSC::HeapSnapshotBuilder::buildSnapshot): This needs to use PreventCollectionScope to ensure snapshot soundness.
* heap/ListableHandler.h:
(JSC::ListableHandler::isOnList): Useful helper.
* heap/LockDuringMarking.h:
(JSC::lockDuringMarking): It's a locker that only locks while we're marking.
* heap/MarkedAllocator.cpp:
(JSC::MarkedAllocator::addBlock): Hold the bitvector lock while resizing.
* heap/MarkedBlock.cpp: Hold the bitvector lock while accessing the bitvectors while the mutator is running.
* heap/MarkedSpace.cpp:
(JSC::MarkedSpace::prepareForConservativeScan): We used to do this in prepareForMarking, but we need to do it before each conservative scan not just before marking.
(JSC::MarkedSpace::prepareForMarking): Remove the logic moved to prepareForConservativeScan.
* heap/MarkedSpace.h:
* heap/PreventCollectionScope.h: Added.
* heap/SlotVisitor.cpp: Refactored drainFromShared so that we can write a similar function called drainInParallelPassively.
(JSC::SlotVisitor::updateMutatorIsStopped): Update whether we can use "fast" scanning.
(JSC::SlotVisitor::mutatorIsStoppedIsUpToDate):
(JSC::SlotVisitor::didReachTermination):
(JSC::SlotVisitor::hasWork):
(JSC::SlotVisitor::drain): This now uses the rightToRun lock to allow the main GC thread to safepoint the workers.
(JSC::SlotVisitor::drainFromShared):
(JSC::SlotVisitor::drainInParallelPassively): This runs marking with one fewer threads than normal. It's useful for when we have resumed the mutator, since then the mutator has a better chance of getting on a core.
(JSC::SlotVisitor::addWeakReferenceHarvester):
(JSC::SlotVisitor::addUnconditionalFinalizer):
(JSC::SlotVisitor::harvestWeakReferences): Deleted.
(JSC::SlotVisitor::finalizeUnconditionalFinalizers): Deleted.
* heap/SlotVisitor.h:
* heap/SlotVisitorInlines.h: Outline stuff.
(JSC::SlotVisitor::addWeakReferenceHarvester): Deleted.
(JSC::SlotVisitor::addUnconditionalFinalizer): Deleted.
* runtime/InferredType.cpp: This needed thread safety.
(JSC::InferredType::visitChildren): This needs to keep its structure finalizer alive until it runs.
(JSC::InferredType::set):
(JSC::InferredType::InferredStructureFinalizer::finalizeUnconditionally):
* runtime/InferredType.h:
* runtime/InferredValue.cpp: This needed thread safety.
(JSC::InferredValue::visitChildren):
(JSC::InferredValue::ValueCleanup::finalizeUnconditionally):
* runtime/JSArray.cpp:
(JSC::JSArray::unshiftCountSlowCase): Update to use new butterfly API.
(JSC::JSArray::unshiftCountWithArrayStorage): Update to use new butterfly API.
* runtime/JSArrayBufferView.cpp:
(JSC::JSArrayBufferView::visitChildren): Thread safety.
* runtime/JSCell.h:
(JSC::JSCell::setStructureIDDirectly): This is used for nuking the structure.
(JSC::JSCell::InternalLocker::InternalLocker): Deleted. The cell is now the lock.
(JSC::JSCell::InternalLocker::~InternalLocker): Deleted. The cell is now the lock.
* runtime/JSCellInlines.h:
(JSC::JSCell::structure): Clean this up.
(JSC::JSCell::lock): The cell is now the lock.
(JSC::JSCell::tryLock):
(JSC::JSCell::unlock):
(JSC::JSCell::isLocked):
(JSC::JSCell::lockInternalLock): Deleted.
(JSC::JSCell::unlockInternalLock): Deleted.
* runtime/JSFunction.cpp:
(JSC::JSFunction::visitChildren): Thread safety.
* runtime/JSGenericTypedArrayViewInlines.h:
(JSC::JSGenericTypedArrayView<Adaptor>::visitChildren): Thread safety.
(JSC::JSGenericTypedArrayView<Adaptor>::slowDownAndWasteMemory): Thread safety.
* runtime/JSObject.cpp:
(JSC::JSObject::markAuxiliaryAndVisitOutOfLineProperties): Factor out this "easy" step of butterfly visiting.
(JSC::JSObject::visitButterfly): Make this achieve 100% precision about structure-butterfly relationships. This relies on the mutator "nuking" the structure prior to "locked" structure-butterfly transitions.
(JSC::JSObject::visitChildren): Use the new, nicer API.
(JSC::JSFinalObject::visitChildren): Use the new, nicer API.
(JSC::JSObject::enterDictionaryIndexingModeWhenArrayStorageAlreadyExists): Use the new butterfly API.
(JSC::JSObject::createInitialUndecided): Use the new butterfly API.
(JSC::JSObject::createInitialInt32): Use the new butterfly API.
(JSC::JSObject::createInitialDouble): Use the new butterfly API.
(JSC::JSObject::createInitialContiguous): Use the new butterfly API.
(JSC::JSObject::createArrayStorage): Use the new butterfly API.
(JSC::JSObject::convertUndecidedToContiguous): Use the new butterfly API.
(JSC::JSObject::convertUndecidedToArrayStorage): Use the new butterfly API.
(JSC::JSObject::convertInt32ToArrayStorage): Use the new butterfly API.
(JSC::JSObject::convertDoubleToContiguous): Use the new butterfly API.
(JSC::JSObject::convertDoubleToArrayStorage): Use the new butterfly API.
(JSC::JSObject::convertContiguousToArrayStorage): Use the new butterfly API.
(JSC::JSObject::increaseVectorLength): Use the new butterfly API.
(JSC::JSObject::shiftButterflyAfterFlattening): Use the new butterfly API.
* runtime/JSObject.h:
(JSC::JSObject::setButterfly): This now does all of the fences. Only use this when you are not also transitioning the structure or the structure's lastOffset.
(JSC::JSObject::nukeStructureAndSetButterfly): Use this when doing locked structure-butterfly transitions.
* runtime/JSObjectInlines.h:
(JSC::JSObject::putDirectWithoutTransition): Use the newly factored out API.
(JSC::JSObject::prepareToPutDirectWithoutTransition): Factor this out!
(JSC::JSObject::putDirectInternal): Use the newly factored out API.
* runtime/JSPropertyNameEnumerator.cpp:
(JSC::JSPropertyNameEnumerator::finishCreation): Locks!
(JSC::JSPropertyNameEnumerator::visitChildren): Locks!
* runtime/JSSegmentedVariableObject.cpp:
(JSC::JSSegmentedVariableObject::visitChildren): Locks!
* runtime/JSString.cpp:
(JSC::JSString::visitChildren): Thread safety.
* runtime/ModuleProgramExecutable.cpp:
(JSC::ModuleProgramExecutable::visitChildren): Thread safety.
* runtime/Options.cpp: For now we disable concurrent GC on not-X86_64.
(JSC::recomputeDependentOptions):
* runtime/Options.h: Change the default max GC parallelism to 8. I don't know why it was still 7.
* runtime/SamplingProfiler.cpp:
(JSC::SamplingProfiler::stackTracesAsJSON): This needs to defer GC before grabbing its lock.
* runtime/SparseArrayValueMap.cpp: This needed thread safety.
(JSC::SparseArrayValueMap::add):
(JSC::SparseArrayValueMap::remove):
(JSC::SparseArrayValueMap::visitChildren):
* runtime/SparseArrayValueMap.h:
* runtime/Structure.cpp: This had a race between addNewPropertyTransition and visitChildren.
(JSC::Structure::Structure):
(JSC::Structure::materializePropertyTable):
(JSC::Structure::addNewPropertyTransition):
(JSC::Structure::flattenDictionaryStructure):
(JSC::Structure::add): Help out with nuking support - the m_offset needs to play along.
(JSC::Structure::visitChildren):
* runtime/Structure.h: Make some useful things public - like the notion of a lastOffset.
* runtime/StructureChain.cpp:
(JSC::StructureChain::visitChildren): Thread safety!
* runtime/StructureChain.h: Thread safety!
* runtime/StructureIDTable.cpp:
(JSC::StructureIDTable::allocateID): Ensure that we don't get nuked IDs.
* runtime/StructureIDTable.h: Add the notion of a nuked ID! It's a bit that the runtime never sees except during specific shady actions like locked structure-butterfly transitions. "Nuking" tells the GC to steer clear and rescan once we fire the barrier.
(JSC::nukedStructureIDBit):
(JSC::nuke):
(JSC::isNuked):
(JSC::decontaminate):
* runtime/StructureInlines.h:
(JSC::Structure::hasIndexingHeader): Better API.
(JSC::Structure::add):
* runtime/VM.cpp: Better GC interaction.
(JSC::VM::ensureWatchdog):
(JSC::VM::deleteAllLinkedCode):
(JSC::VM::deleteAllCode):
* runtime/VM.h:
(JSC::VM::getStructure): Why wasn't this always an API!
* runtime/WebAssemblyExecutable.cpp:
(JSC::WebAssemblyExecutable::visitChildren): Thread safety.
Source/WebCore:
Concurrent GC should be stable enough to land enabled on X86_64
https://bugs.webkit.org/show_bug.cgi?id=164990
Reviewed by Geoffrey Garen.
Made WebCore down with concurrent marking by adding some locking and adapting to some new API.
This has new test modes in run-sjc-stress-tests. Also, the way that LayoutTests run is already
a fantastic GC test.
* ForwardingHeaders/heap/DeleteAllCodeEffort.h: Added.
* ForwardingHeaders/heap/LockDuringMarking.h: Added.
* bindings/js/GCController.cpp:
(WebCore::GCController::deleteAllCode):
(WebCore::GCController::deleteAllLinkedCode):
* bindings/js/GCController.h:
* bindings/js/JSDOMBinding.cpp:
(WebCore::getCachedDOMStructure):
(WebCore::cacheDOMStructure):
* bindings/js/JSDOMGlobalObject.cpp:
(WebCore::JSDOMGlobalObject::addBuiltinGlobals):
(WebCore::JSDOMGlobalObject::visitChildren):
* bindings/js/JSDOMGlobalObject.h:
(WebCore::getDOMConstructor):
* bindings/js/JSDOMPromise.cpp:
(WebCore::DeferredPromise::DeferredPromise):
(WebCore::DeferredPromise::clear):
* bindings/js/JSXPathResultCustom.cpp:
(WebCore::JSXPathResult::visitAdditionalChildren):
* dom/EventListenerMap.cpp:
(WebCore::EventListenerMap::clear):
(WebCore::EventListenerMap::replace):
(WebCore::EventListenerMap::add):
(WebCore::EventListenerMap::remove):
(WebCore::EventListenerMap::find):
(WebCore::EventListenerMap::removeFirstEventListenerCreatedFromMarkup):
(WebCore::EventListenerMap::copyEventListenersNotCreatedFromMarkupToTarget):
(WebCore::EventListenerIterator::EventListenerIterator):
* dom/EventListenerMap.h:
(WebCore::EventListenerMap::lock):
* dom/EventTarget.cpp:
(WebCore::EventTarget::visitJSEventListeners):
* dom/EventTarget.h:
(WebCore::EventTarget::visitJSEventListeners): Deleted.
* dom/Node.cpp:
(WebCore::Node::eventTargetDataConcurrently):
(WebCore::Node::ensureEventTargetData):
(WebCore::Node::clearEventTargetData):
* dom/Node.h:
* page/MemoryRelease.cpp:
(WebCore::releaseCriticalMemory):
* page/cocoa/MemoryReleaseCocoa.mm:
(WebCore::jettisonExpensiveObjectsOnTopLevelNavigation):
(WebCore::registerMemoryReleaseNotifyCallbacks):
Source/WTF:
Concurrent GC should be stable enough to land enabled on X86_64
https://bugs.webkit.org/show_bug.cgi?id=164990
Reviewed by Geoffrey Garen.
Adds the ability to say:
auto locker = holdLock(any type of lock)
Instead of having to say:
Locker<LockType> locker(locks of type LockType)
I think that we should use "auto locker = holdLock(lock)" as the default way that we acquire
locks unless we need to use a special locker type.
This also adds the ability to safepoint a lock. Safepointing a lock is basically a super fast
way of unlocking it fairly and then immediately relocking it - i.e. letting anyone who is
waiting to run without losing steam of there is noone waiting.
* wtf/Lock.cpp:
(WTF::LockBase::safepointSlow):
* wtf/Lock.h:
(WTF::LockBase::safepoint):
* wtf/LockAlgorithm.h:
(WTF::LockAlgorithm::safepointFast):
(WTF::LockAlgorithm::safepoint):
(WTF::LockAlgorithm::safepointSlow):
* wtf/Locker.h:
(WTF::AbstractLocker::AbstractLocker):
(WTF::Locker::tryLock):
(WTF::Locker::operator bool):
(WTF::Locker::Locker):
(WTF::Locker::operator=):
(WTF::holdLock):
(WTF::tryHoldLock):
Tools:
Concurrent GC should be stable enough to land enabled
https://bugs.webkit.org/show_bug.cgi?id=164990
Reviewed by Geoffrey Garen.
Add a new mode that runs GC continuously. Also made eager modes run GC continuously.
It's clear that this works just fine in release, but I'm still trying to figure out if it's
safe for debug. It might be too slow for debug.
* Scripts/run-jsc-stress-tests:
Canonical link: https://commits.webkit.org/183229@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@209570 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-12-08 22:14:50 +00:00
|
|
|
WTF_EXPORT_PRIVATE void safepointSlow();
|
Lightweight locks should be adaptive
https://bugs.webkit.org/show_bug.cgi?id=147545
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
* dfg/DFGCommon.cpp:
(JSC::DFG::startCrashing):
* heap/CopiedBlock.h:
(JSC::CopiedBlock::workListLock):
* heap/CopiedBlockInlines.h:
(JSC::CopiedBlock::shouldReportLiveBytes):
(JSC::CopiedBlock::reportLiveBytes):
* heap/CopiedSpace.cpp:
(JSC::CopiedSpace::doneFillingBlock):
* heap/CopiedSpace.h:
(JSC::CopiedSpace::CopiedGeneration::CopiedGeneration):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::recycleEvacuatedBlock):
* heap/GCThreadSharedData.cpp:
(JSC::GCThreadSharedData::didStartCopying):
* heap/GCThreadSharedData.h:
(JSC::GCThreadSharedData::getNextBlocksToCopy):
* heap/ListableHandler.h:
(JSC::ListableHandler::List::addThreadSafe):
(JSC::ListableHandler::List::addNotThreadSafe):
* heap/MachineStackMarker.cpp:
(JSC::MachineThreads::tryCopyOtherThreadStacks):
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::copyLater):
* parser/SourceProvider.cpp:
(JSC::SourceProvider::~SourceProvider):
(JSC::SourceProvider::getID):
* profiler/ProfilerDatabase.cpp:
(JSC::Profiler::Database::addDatabaseToAtExit):
(JSC::Profiler::Database::removeDatabaseFromAtExit):
(JSC::Profiler::Database::removeFirstAtExitDatabase):
* runtime/TypeProfilerLog.h:
Source/WebCore:
* bindings/objc/WebScriptObject.mm:
(WebCore::getJSWrapper):
(WebCore::addJSWrapper):
(WebCore::removeJSWrapper):
(WebCore::removeJSWrapperIfRetainCountOne):
* platform/audio/mac/CARingBuffer.cpp:
(WebCore::CARingBuffer::setCurrentFrameBounds):
(WebCore::CARingBuffer::getCurrentFrameBounds):
* platform/audio/mac/CARingBuffer.h:
* platform/ios/wak/WAKWindow.mm:
(-[WAKWindow setExposedScrollViewRect:]):
(-[WAKWindow exposedScrollViewRect]):
Source/WebKit2:
* WebProcess/WebPage/EventDispatcher.cpp:
(WebKit::EventDispatcher::clearQueuedTouchEventsForPage):
(WebKit::EventDispatcher::getQueuedTouchEventsForPage):
(WebKit::EventDispatcher::touchEvent):
(WebKit::EventDispatcher::dispatchTouchEvents):
* WebProcess/WebPage/EventDispatcher.h:
* WebProcess/WebPage/ViewUpdateDispatcher.cpp:
(WebKit::ViewUpdateDispatcher::visibleContentRectUpdate):
(WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate):
* WebProcess/WebPage/ViewUpdateDispatcher.h:
Source/WTF:
A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition
overhead is lower than system locks and because they take dramatically less space than system
locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock
acquire is up to 10x faster and under microcontention - short critical section with two or
more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4
bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit
should continue to avoid system locks - they are just far too slow and far too big.
But there is a problem with this idiom. System lock implementations will sleep a thread when
it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU.
In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for
microcontention, but awful when the lock will not be released for a while. In fact, when
critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is
almost 100x more than the CPU time cost of a system lock. This case doesn't arise too
frequently in our current uses of spinlocks, but that's probably because right now there are
places where we make a conscious decision to use system locks - even though they use more
memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a
while to acquire the lock.
The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new
concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks
that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The
idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held,
the slow path tries some number of spins to acquire the lock, and if that fails, the thread is
put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and
the lock itself is a tagged pointer: either it is just bits telling us the complete lock state
(not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the
lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path
if the lock is not contended, a short burst of spinning for microcontention, and a full-blown
queue for critical sections that are held for a long time.
On a locking microbenchmark, this new Lock exhibits the following performance
characteristics:
- Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster
than a system mutex.
- Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster
than a system mutex.
- CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a
SpinLock.
- Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than
a SpinLock.
This patch replaces all uses of SpinLock with Lock, since our critical sections are not
no-ops so if you do basically anything in your critical section, the Lock overhead will be
invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead
instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is
as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time.
This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits
of having a lock that just uses a byte are still better than the CPU wastage benefits of
Lock. But, this work will enable some future work to create locks that will fit in just 1.6
bits: https://bugs.webkit.org/show_bug.cgi?id=147665.
Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak
CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if
there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the
possibility of a null queue head.
[1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf
* WTF.vcxproj/WTF.vcxproj:
* WTF.xcodeproj/project.pbxproj:
* benchmarks: Added.
* benchmarks/LockSpeedTest.cpp: Added.
(main):
* wtf/Atomics.h:
(WTF::Atomic::compareExchangeWeak):
(WTF::Atomic::compareExchangeStrong):
* wtf/CMakeLists.txt:
* wtf/Lock.cpp: Added.
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
* wtf/Lock.h: Added.
(WTF::LockBase::lock):
(WTF::LockBase::unlock):
(WTF::LockBase::isHeld):
(WTF::LockBase::isLocked):
(WTF::Lock::Lock):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocatorHandle::shrink):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
(WTF::MetaAllocator::addFreshFreeSpace):
(WTF::MetaAllocator::debugFreeSpaceSize):
* wtf/MetaAllocator.h:
* wtf/SpinLock.h:
* wtf/ThreadingPthreads.cpp:
* wtf/ThreadingWin.cpp:
* wtf/text/AtomicString.cpp:
* wtf/text/AtomicStringImpl.cpp:
(WTF::AtomicStringTableLocker::AtomicStringTableLocker):
Tools:
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/Lock.cpp: Added.
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/165908@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
|
|
|
|
WTF::Lock should not suffer from the thundering herd
https://bugs.webkit.org/show_bug.cgi?id=147947
Reviewed by Geoffrey Garen.
Source/WTF:
This changes Lock::unlockSlow() to use unparkOne() instead of unparkAll(). The problem with
doing this is that it's not obvious after calling unparkOne() if there are any other threads
that are still parked on the lock's queue. If we assume that there are and leave the
hasParkedBit set, then future calls to unlock() will take the slow path. We don't want that
if there aren't actually any threads parked. On the other hand, if we assume that there
aren't any threads parked and clear the hasParkedBit, then if there actually were some
threads parked, then they may never be awoken since future calls to unlock() won't take slow
path and so won't call unparkOne(). In other words, we need a way to be very precise about
when we clear the hasParkedBit and we need to do it in a race-free way: it can't be the case
that we clear the bit just as some thread gets parked on the queue.
A similar problem arises in futexes, and one of the solutions is to have a thread that
acquires a lock after parking sets the hasParkedBit. This is what Rusty Russel's usersem
does. It's a subtle algorithm. Also, it means that if a thread barges in before the unparked
thread runs, then that barging thread will not know that there are threads parked. This
could increase the severity of barging.
Since ParkingLot is a user-level API, we don't have to worry about the kernel-user security
issues and so we can expose callbacks while ParkingLot is holding its internal locks. This
change does exactly that for unparkOne(). The new variant of unparkOne() will call a user
function while the queue from which we are unparking is locked. The callback is told basic
stats about the queue: did we unpark a thread this time, and could there be more threads to
unpark in the future. The callback runs while it's impossible for the queue state to change,
since the ParkingLot's internal locks for the queue is held. This means that
Lock::unlockSlow() can either clear, or leave, the hasParkedBit while releasing the lock
inside the callback from unparkOne(). This takes care of the thundering herd problem while
also reducing the greed that arises from barging threads.
This required some careful reworking of the ParkingLot algorithm. The first thing I noticed
was that the ThreadData::shouldPark flag was useless, since it's set exactly when
ThreadData::address is non-null. Then I had to make sure that dequeue() could lazily create
both hashtables and buckets, since the "callback is called while queue is locked" invariant
requires that we didn't exit early due to the hashtable or bucket not being present. Note
that all of this is done in such a way that the old unparkOne() and unparkAll() don't have
to create any buckets, though they now may create the hashtable. We don't care as much about
the hashtable being created by unpark since it's just such an unlikely scenario and it would
only happen once.
This change reduces the kernel CPU usage of WTF::Lock for the long critical section test by
about 8x and makes it always perform as well as WTF::WordLock and WTF::Mutex for that
benchmark.
* benchmarks/LockSpeedTest.cpp:
* wtf/Lock.cpp:
(WTF::LockBase::unlockSlow):
* wtf/Lock.h:
(WTF::LockBase::isLocked):
(WTF::LockBase::isFullyReset):
* wtf/ParkingLot.cpp:
(WTF::ParkingLot::parkConditionally):
(WTF::ParkingLot::unparkOne):
(WTF::ParkingLot::unparkAll):
* wtf/ParkingLot.h:
* wtf/WordLock.h:
(WTF::WordLock::isLocked):
(WTF::WordLock::isFullyReset):
Tools:
Add testing that checks that locks return to a pristine state after contention is over.
* TestWebKitAPI/Tests/WTF/Lock.cpp:
(TestWebKitAPI::LockInspector::isFullyReset):
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/166072@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-13 03:51:25 +00:00
|
|
|
// Method used for testing only.
|
|
|
|
bool isFullyReset() const
|
|
|
|
{
|
|
|
|
return !m_byte.load();
|
|
|
|
}
|
|
|
|
|
2017-12-07 03:52:09 +00:00
|
|
|
Atomic<uint8_t> m_byte { 0 };
|
2015-08-21 00:47:16 +00:00
|
|
|
};
|
Lightweight locks should be adaptive
https://bugs.webkit.org/show_bug.cgi?id=147545
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
* dfg/DFGCommon.cpp:
(JSC::DFG::startCrashing):
* heap/CopiedBlock.h:
(JSC::CopiedBlock::workListLock):
* heap/CopiedBlockInlines.h:
(JSC::CopiedBlock::shouldReportLiveBytes):
(JSC::CopiedBlock::reportLiveBytes):
* heap/CopiedSpace.cpp:
(JSC::CopiedSpace::doneFillingBlock):
* heap/CopiedSpace.h:
(JSC::CopiedSpace::CopiedGeneration::CopiedGeneration):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::recycleEvacuatedBlock):
* heap/GCThreadSharedData.cpp:
(JSC::GCThreadSharedData::didStartCopying):
* heap/GCThreadSharedData.h:
(JSC::GCThreadSharedData::getNextBlocksToCopy):
* heap/ListableHandler.h:
(JSC::ListableHandler::List::addThreadSafe):
(JSC::ListableHandler::List::addNotThreadSafe):
* heap/MachineStackMarker.cpp:
(JSC::MachineThreads::tryCopyOtherThreadStacks):
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::copyLater):
* parser/SourceProvider.cpp:
(JSC::SourceProvider::~SourceProvider):
(JSC::SourceProvider::getID):
* profiler/ProfilerDatabase.cpp:
(JSC::Profiler::Database::addDatabaseToAtExit):
(JSC::Profiler::Database::removeDatabaseFromAtExit):
(JSC::Profiler::Database::removeFirstAtExitDatabase):
* runtime/TypeProfilerLog.h:
Source/WebCore:
* bindings/objc/WebScriptObject.mm:
(WebCore::getJSWrapper):
(WebCore::addJSWrapper):
(WebCore::removeJSWrapper):
(WebCore::removeJSWrapperIfRetainCountOne):
* platform/audio/mac/CARingBuffer.cpp:
(WebCore::CARingBuffer::setCurrentFrameBounds):
(WebCore::CARingBuffer::getCurrentFrameBounds):
* platform/audio/mac/CARingBuffer.h:
* platform/ios/wak/WAKWindow.mm:
(-[WAKWindow setExposedScrollViewRect:]):
(-[WAKWindow exposedScrollViewRect]):
Source/WebKit2:
* WebProcess/WebPage/EventDispatcher.cpp:
(WebKit::EventDispatcher::clearQueuedTouchEventsForPage):
(WebKit::EventDispatcher::getQueuedTouchEventsForPage):
(WebKit::EventDispatcher::touchEvent):
(WebKit::EventDispatcher::dispatchTouchEvents):
* WebProcess/WebPage/EventDispatcher.h:
* WebProcess/WebPage/ViewUpdateDispatcher.cpp:
(WebKit::ViewUpdateDispatcher::visibleContentRectUpdate):
(WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate):
* WebProcess/WebPage/ViewUpdateDispatcher.h:
Source/WTF:
A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition
overhead is lower than system locks and because they take dramatically less space than system
locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock
acquire is up to 10x faster and under microcontention - short critical section with two or
more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4
bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit
should continue to avoid system locks - they are just far too slow and far too big.
But there is a problem with this idiom. System lock implementations will sleep a thread when
it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU.
In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for
microcontention, but awful when the lock will not be released for a while. In fact, when
critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is
almost 100x more than the CPU time cost of a system lock. This case doesn't arise too
frequently in our current uses of spinlocks, but that's probably because right now there are
places where we make a conscious decision to use system locks - even though they use more
memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a
while to acquire the lock.
The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new
concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks
that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The
idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held,
the slow path tries some number of spins to acquire the lock, and if that fails, the thread is
put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and
the lock itself is a tagged pointer: either it is just bits telling us the complete lock state
(not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the
lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path
if the lock is not contended, a short burst of spinning for microcontention, and a full-blown
queue for critical sections that are held for a long time.
On a locking microbenchmark, this new Lock exhibits the following performance
characteristics:
- Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster
than a system mutex.
- Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster
than a system mutex.
- CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a
SpinLock.
- Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than
a SpinLock.
This patch replaces all uses of SpinLock with Lock, since our critical sections are not
no-ops so if you do basically anything in your critical section, the Lock overhead will be
invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead
instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is
as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time.
This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits
of having a lock that just uses a byte are still better than the CPU wastage benefits of
Lock. But, this work will enable some future work to create locks that will fit in just 1.6
bits: https://bugs.webkit.org/show_bug.cgi?id=147665.
Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak
CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if
there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the
possibility of a null queue head.
[1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf
* WTF.vcxproj/WTF.vcxproj:
* WTF.xcodeproj/project.pbxproj:
* benchmarks: Added.
* benchmarks/LockSpeedTest.cpp: Added.
(main):
* wtf/Atomics.h:
(WTF::Atomic::compareExchangeWeak):
(WTF::Atomic::compareExchangeStrong):
* wtf/CMakeLists.txt:
* wtf/Lock.cpp: Added.
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
* wtf/Lock.h: Added.
(WTF::LockBase::lock):
(WTF::LockBase::unlock):
(WTF::LockBase::isHeld):
(WTF::LockBase::isLocked):
(WTF::Lock::Lock):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocatorHandle::shrink):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
(WTF::MetaAllocator::addFreshFreeSpace):
(WTF::MetaAllocator::debugFreeSpaceSize):
* wtf/MetaAllocator.h:
* wtf/SpinLock.h:
* wtf/ThreadingPthreads.cpp:
* wtf/ThreadingWin.cpp:
* wtf/text/AtomicString.cpp:
* wtf/text/AtomicStringImpl.cpp:
(WTF::AtomicStringTableLocker::AtomicStringTableLocker):
Tools:
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/Lock.cpp: Added.
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/165908@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
|
|
|
|
Make CheckedLock the default Lock
https://bugs.webkit.org/show_bug.cgi?id=226157
Reviewed by Darin Adler.
Make CheckedLock the default Lock so that we get more benefits from Clang
Thread Safety Analysis. Note that CheckedLock 100% relies on the existing
Source/JavaScriptCore:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* assembler/testmasm.cpp:
* dfg/DFGCommon.cpp:
* dfg/DFGThreadData.h:
* dfg/DFGWorklist.cpp:
(JSC::DFG::Worklist::Worklist):
* dfg/DFGWorklist.h:
* dynbench.cpp:
* heap/BlockDirectory.h:
(JSC::BlockDirectory::bitvectorLock):
* heap/CodeBlockSet.h:
(JSC::CodeBlockSet::getLock):
* heap/Heap.cpp:
(JSC::Heap::Heap):
* heap/Heap.h:
* heap/MarkedSpace.h:
(JSC::MarkedSpace::directoryLock):
* heap/MarkingConstraintSolver.h:
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::donateKnownParallel):
* heap/SlotVisitor.h:
* jit/ExecutableAllocator.cpp:
(JSC::ExecutableAllocator::getLock const):
(JSC::dumpJITMemory):
* jit/ExecutableAllocator.h:
(JSC::ExecutableAllocatorBase::getLock const):
* jit/JITWorklist.cpp:
(JSC::JITWorklist::JITWorklist):
* jit/JITWorklist.h:
* jsc.cpp:
* profiler/ProfilerDatabase.h:
* runtime/ConcurrentJSLock.h:
* runtime/DeferredWorkTimer.h:
* runtime/JSLock.h:
* runtime/SamplingProfiler.cpp:
(JSC::FrameWalker::FrameWalker):
(JSC::CFrameWalker::CFrameWalker):
(JSC::SamplingProfiler::takeSample):
* runtime/SamplingProfiler.h:
(JSC::SamplingProfiler::getLock):
* runtime/VM.h:
* runtime/VMTraps.cpp:
(JSC::VMTraps::invalidateCodeBlocksOnStack):
(JSC::VMTraps::VMTraps):
* runtime/VMTraps.h:
* tools/FunctionOverrides.h:
* tools/VMInspector.cpp:
(JSC::ensureIsSafeToLock):
* tools/VMInspector.h:
(JSC::VMInspector::getLock):
* wasm/WasmCalleeRegistry.h:
(JSC::Wasm::CalleeRegistry::getLock):
* wasm/WasmPlan.h:
* wasm/WasmStreamingCompiler.h:
* wasm/WasmThunks.h:
* wasm/WasmWorklist.cpp:
(JSC::Wasm::Worklist::Worklist):
* wasm/WasmWorklist.h:
Source/WebCore:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* Modules/indexeddb/server/IDBServer.cpp:
* Modules/webaudio/MediaElementAudioSourceNode.h:
* Modules/webdatabase/OriginLock.cpp:
* bindings/js/JSDOMGlobalObject.h:
* dom/Node.cpp:
* html/HTMLMediaElement.cpp:
(WebCore::HTMLMediaElement::createMediaPlayer):
* html/canvas/WebGLContextGroup.cpp:
(WebCore::WebGLContextGroup::objectGraphLockForAContext):
* html/canvas/WebGLContextGroup.h:
* html/canvas/WebGLContextObject.cpp:
(WebCore::WebGLContextObject::objectGraphLockForContext):
* html/canvas/WebGLContextObject.h:
* html/canvas/WebGLObject.h:
* html/canvas/WebGLRenderingContextBase.cpp:
(WebCore::WebGLRenderingContextBase::objectGraphLock):
* html/canvas/WebGLRenderingContextBase.h:
* html/canvas/WebGLSharedObject.cpp:
(WebCore::WebGLSharedObject::objectGraphLockForContext):
* html/canvas/WebGLSharedObject.h:
* page/scrolling/mac/ScrollingTreeMac.h:
* platform/audio/ReverbConvolver.cpp:
(WebCore::ReverbConvolver::backgroundThreadEntry):
* platform/graphics/ShadowBlur.cpp:
(WebCore::ScratchBuffer::lock):
(WebCore::ShadowBlur::drawRectShadowWithTiling):
(WebCore::ShadowBlur::drawInsetShadowWithTiling):
* platform/graphics/gstreamer/VideoSinkGStreamer.cpp:
* platform/graphics/gstreamer/eme/WebKitCommonEncryptionDecryptorGStreamer.cpp:
Source/WebKit:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* GPUProcess/graphics/RemoteGraphicsContextGL.cpp:
(WebKit::RemoteGraphicsContextGL::paintPixelBufferToImageBuffer):
* NetworkProcess/IndexedDB/WebIDBServer.cpp:
* UIProcess/API/glib/IconDatabase.h:
* UIProcess/mac/WKPrintingView.mm:
(-[WKPrintingView knowsPageRange:]):
Source/WTF:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* wtf/AutomaticThread.cpp:
(WTF::AutomaticThreadCondition::wait):
(WTF::AutomaticThreadCondition::waitFor):
(WTF::AutomaticThread::AutomaticThread):
* wtf/AutomaticThread.h:
* wtf/CheckedCondition.h:
* wtf/CheckedLock.h:
* wtf/Condition.h:
* wtf/Lock.cpp:
(WTF::UncheckedLock::lockSlow):
(WTF::UncheckedLock::unlockSlow):
(WTF::UncheckedLock::unlockFairlySlow):
(WTF::UncheckedLock::safepointSlow):
* wtf/Lock.h:
(WTF::WTF_ASSERTS_ACQUIRED_LOCK):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocator::MetaAllocator):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
* wtf/MetaAllocator.h:
* wtf/ParallelHelperPool.cpp:
(WTF::ParallelHelperPool::ParallelHelperPool):
* wtf/ParallelHelperPool.h:
* wtf/RecursiveLockAdapter.h:
* wtf/WorkerPool.cpp:
(WTF::WorkerPool::WorkerPool):
* wtf/WorkerPool.h:
Tools:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* TestWebKitAPI/Tests/WTF/CheckedConditionTest.cpp:
* TestWebKitAPI/Tests/WTF/Condition.cpp:
* TestWebKitAPI/Tests/WTF/MetaAllocator.cpp:
* WebKitTestRunner/InjectedBundle/AccessibilityController.cpp:
(WTR::AXThread::createThreadIfNeeded):
Canonical link: https://commits.webkit.org/238070@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277943 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-24 05:37:41 +00:00
|
|
|
// Asserts that the lock is held.
|
|
|
|
// This can be used in cases where the annotations cannot be added to the function
|
|
|
|
// declaration.
|
|
|
|
inline void assertIsHeld(const Lock& lock) WTF_ASSERTS_ACQUIRED_LOCK(lock) { ASSERT_UNUSED(lock, lock.isHeld()); }
|
|
|
|
|
|
|
|
// Locker specialization to use with Lock.
|
|
|
|
// Non-movable simple scoped lock holder.
|
|
|
|
// Example: Locker locker { m_lock };
|
|
|
|
template <>
|
2021-05-25 23:19:19 +00:00
|
|
|
class WTF_CAPABILITY_SCOPED_LOCK Locker<Lock> : public AbstractLocker {
|
Make CheckedLock the default Lock
https://bugs.webkit.org/show_bug.cgi?id=226157
Reviewed by Darin Adler.
Make CheckedLock the default Lock so that we get more benefits from Clang
Thread Safety Analysis. Note that CheckedLock 100% relies on the existing
Source/JavaScriptCore:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* assembler/testmasm.cpp:
* dfg/DFGCommon.cpp:
* dfg/DFGThreadData.h:
* dfg/DFGWorklist.cpp:
(JSC::DFG::Worklist::Worklist):
* dfg/DFGWorklist.h:
* dynbench.cpp:
* heap/BlockDirectory.h:
(JSC::BlockDirectory::bitvectorLock):
* heap/CodeBlockSet.h:
(JSC::CodeBlockSet::getLock):
* heap/Heap.cpp:
(JSC::Heap::Heap):
* heap/Heap.h:
* heap/MarkedSpace.h:
(JSC::MarkedSpace::directoryLock):
* heap/MarkingConstraintSolver.h:
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::donateKnownParallel):
* heap/SlotVisitor.h:
* jit/ExecutableAllocator.cpp:
(JSC::ExecutableAllocator::getLock const):
(JSC::dumpJITMemory):
* jit/ExecutableAllocator.h:
(JSC::ExecutableAllocatorBase::getLock const):
* jit/JITWorklist.cpp:
(JSC::JITWorklist::JITWorklist):
* jit/JITWorklist.h:
* jsc.cpp:
* profiler/ProfilerDatabase.h:
* runtime/ConcurrentJSLock.h:
* runtime/DeferredWorkTimer.h:
* runtime/JSLock.h:
* runtime/SamplingProfiler.cpp:
(JSC::FrameWalker::FrameWalker):
(JSC::CFrameWalker::CFrameWalker):
(JSC::SamplingProfiler::takeSample):
* runtime/SamplingProfiler.h:
(JSC::SamplingProfiler::getLock):
* runtime/VM.h:
* runtime/VMTraps.cpp:
(JSC::VMTraps::invalidateCodeBlocksOnStack):
(JSC::VMTraps::VMTraps):
* runtime/VMTraps.h:
* tools/FunctionOverrides.h:
* tools/VMInspector.cpp:
(JSC::ensureIsSafeToLock):
* tools/VMInspector.h:
(JSC::VMInspector::getLock):
* wasm/WasmCalleeRegistry.h:
(JSC::Wasm::CalleeRegistry::getLock):
* wasm/WasmPlan.h:
* wasm/WasmStreamingCompiler.h:
* wasm/WasmThunks.h:
* wasm/WasmWorklist.cpp:
(JSC::Wasm::Worklist::Worklist):
* wasm/WasmWorklist.h:
Source/WebCore:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* Modules/indexeddb/server/IDBServer.cpp:
* Modules/webaudio/MediaElementAudioSourceNode.h:
* Modules/webdatabase/OriginLock.cpp:
* bindings/js/JSDOMGlobalObject.h:
* dom/Node.cpp:
* html/HTMLMediaElement.cpp:
(WebCore::HTMLMediaElement::createMediaPlayer):
* html/canvas/WebGLContextGroup.cpp:
(WebCore::WebGLContextGroup::objectGraphLockForAContext):
* html/canvas/WebGLContextGroup.h:
* html/canvas/WebGLContextObject.cpp:
(WebCore::WebGLContextObject::objectGraphLockForContext):
* html/canvas/WebGLContextObject.h:
* html/canvas/WebGLObject.h:
* html/canvas/WebGLRenderingContextBase.cpp:
(WebCore::WebGLRenderingContextBase::objectGraphLock):
* html/canvas/WebGLRenderingContextBase.h:
* html/canvas/WebGLSharedObject.cpp:
(WebCore::WebGLSharedObject::objectGraphLockForContext):
* html/canvas/WebGLSharedObject.h:
* page/scrolling/mac/ScrollingTreeMac.h:
* platform/audio/ReverbConvolver.cpp:
(WebCore::ReverbConvolver::backgroundThreadEntry):
* platform/graphics/ShadowBlur.cpp:
(WebCore::ScratchBuffer::lock):
(WebCore::ShadowBlur::drawRectShadowWithTiling):
(WebCore::ShadowBlur::drawInsetShadowWithTiling):
* platform/graphics/gstreamer/VideoSinkGStreamer.cpp:
* platform/graphics/gstreamer/eme/WebKitCommonEncryptionDecryptorGStreamer.cpp:
Source/WebKit:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* GPUProcess/graphics/RemoteGraphicsContextGL.cpp:
(WebKit::RemoteGraphicsContextGL::paintPixelBufferToImageBuffer):
* NetworkProcess/IndexedDB/WebIDBServer.cpp:
* UIProcess/API/glib/IconDatabase.h:
* UIProcess/mac/WKPrintingView.mm:
(-[WKPrintingView knowsPageRange:]):
Source/WTF:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* wtf/AutomaticThread.cpp:
(WTF::AutomaticThreadCondition::wait):
(WTF::AutomaticThreadCondition::waitFor):
(WTF::AutomaticThread::AutomaticThread):
* wtf/AutomaticThread.h:
* wtf/CheckedCondition.h:
* wtf/CheckedLock.h:
* wtf/Condition.h:
* wtf/Lock.cpp:
(WTF::UncheckedLock::lockSlow):
(WTF::UncheckedLock::unlockSlow):
(WTF::UncheckedLock::unlockFairlySlow):
(WTF::UncheckedLock::safepointSlow):
* wtf/Lock.h:
(WTF::WTF_ASSERTS_ACQUIRED_LOCK):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocator::MetaAllocator):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
* wtf/MetaAllocator.h:
* wtf/ParallelHelperPool.cpp:
(WTF::ParallelHelperPool::ParallelHelperPool):
* wtf/ParallelHelperPool.h:
* wtf/RecursiveLockAdapter.h:
* wtf/WorkerPool.cpp:
(WTF::WorkerPool::WorkerPool):
* wtf/WorkerPool.h:
Tools:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* TestWebKitAPI/Tests/WTF/CheckedConditionTest.cpp:
* TestWebKitAPI/Tests/WTF/Condition.cpp:
* TestWebKitAPI/Tests/WTF/MetaAllocator.cpp:
* WebKitTestRunner/InjectedBundle/AccessibilityController.cpp:
(WTR::AXThread::createThreadIfNeeded):
Canonical link: https://commits.webkit.org/238070@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277943 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-24 05:37:41 +00:00
|
|
|
public:
|
|
|
|
explicit Locker(Lock& lock) WTF_ACQUIRES_LOCK(lock)
|
|
|
|
: m_lock(lock)
|
2021-05-26 00:37:19 +00:00
|
|
|
, m_isLocked(true)
|
Make CheckedLock the default Lock
https://bugs.webkit.org/show_bug.cgi?id=226157
Reviewed by Darin Adler.
Make CheckedLock the default Lock so that we get more benefits from Clang
Thread Safety Analysis. Note that CheckedLock 100% relies on the existing
Source/JavaScriptCore:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* assembler/testmasm.cpp:
* dfg/DFGCommon.cpp:
* dfg/DFGThreadData.h:
* dfg/DFGWorklist.cpp:
(JSC::DFG::Worklist::Worklist):
* dfg/DFGWorklist.h:
* dynbench.cpp:
* heap/BlockDirectory.h:
(JSC::BlockDirectory::bitvectorLock):
* heap/CodeBlockSet.h:
(JSC::CodeBlockSet::getLock):
* heap/Heap.cpp:
(JSC::Heap::Heap):
* heap/Heap.h:
* heap/MarkedSpace.h:
(JSC::MarkedSpace::directoryLock):
* heap/MarkingConstraintSolver.h:
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::donateKnownParallel):
* heap/SlotVisitor.h:
* jit/ExecutableAllocator.cpp:
(JSC::ExecutableAllocator::getLock const):
(JSC::dumpJITMemory):
* jit/ExecutableAllocator.h:
(JSC::ExecutableAllocatorBase::getLock const):
* jit/JITWorklist.cpp:
(JSC::JITWorklist::JITWorklist):
* jit/JITWorklist.h:
* jsc.cpp:
* profiler/ProfilerDatabase.h:
* runtime/ConcurrentJSLock.h:
* runtime/DeferredWorkTimer.h:
* runtime/JSLock.h:
* runtime/SamplingProfiler.cpp:
(JSC::FrameWalker::FrameWalker):
(JSC::CFrameWalker::CFrameWalker):
(JSC::SamplingProfiler::takeSample):
* runtime/SamplingProfiler.h:
(JSC::SamplingProfiler::getLock):
* runtime/VM.h:
* runtime/VMTraps.cpp:
(JSC::VMTraps::invalidateCodeBlocksOnStack):
(JSC::VMTraps::VMTraps):
* runtime/VMTraps.h:
* tools/FunctionOverrides.h:
* tools/VMInspector.cpp:
(JSC::ensureIsSafeToLock):
* tools/VMInspector.h:
(JSC::VMInspector::getLock):
* wasm/WasmCalleeRegistry.h:
(JSC::Wasm::CalleeRegistry::getLock):
* wasm/WasmPlan.h:
* wasm/WasmStreamingCompiler.h:
* wasm/WasmThunks.h:
* wasm/WasmWorklist.cpp:
(JSC::Wasm::Worklist::Worklist):
* wasm/WasmWorklist.h:
Source/WebCore:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* Modules/indexeddb/server/IDBServer.cpp:
* Modules/webaudio/MediaElementAudioSourceNode.h:
* Modules/webdatabase/OriginLock.cpp:
* bindings/js/JSDOMGlobalObject.h:
* dom/Node.cpp:
* html/HTMLMediaElement.cpp:
(WebCore::HTMLMediaElement::createMediaPlayer):
* html/canvas/WebGLContextGroup.cpp:
(WebCore::WebGLContextGroup::objectGraphLockForAContext):
* html/canvas/WebGLContextGroup.h:
* html/canvas/WebGLContextObject.cpp:
(WebCore::WebGLContextObject::objectGraphLockForContext):
* html/canvas/WebGLContextObject.h:
* html/canvas/WebGLObject.h:
* html/canvas/WebGLRenderingContextBase.cpp:
(WebCore::WebGLRenderingContextBase::objectGraphLock):
* html/canvas/WebGLRenderingContextBase.h:
* html/canvas/WebGLSharedObject.cpp:
(WebCore::WebGLSharedObject::objectGraphLockForContext):
* html/canvas/WebGLSharedObject.h:
* page/scrolling/mac/ScrollingTreeMac.h:
* platform/audio/ReverbConvolver.cpp:
(WebCore::ReverbConvolver::backgroundThreadEntry):
* platform/graphics/ShadowBlur.cpp:
(WebCore::ScratchBuffer::lock):
(WebCore::ShadowBlur::drawRectShadowWithTiling):
(WebCore::ShadowBlur::drawInsetShadowWithTiling):
* platform/graphics/gstreamer/VideoSinkGStreamer.cpp:
* platform/graphics/gstreamer/eme/WebKitCommonEncryptionDecryptorGStreamer.cpp:
Source/WebKit:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* GPUProcess/graphics/RemoteGraphicsContextGL.cpp:
(WebKit::RemoteGraphicsContextGL::paintPixelBufferToImageBuffer):
* NetworkProcess/IndexedDB/WebIDBServer.cpp:
* UIProcess/API/glib/IconDatabase.h:
* UIProcess/mac/WKPrintingView.mm:
(-[WKPrintingView knowsPageRange:]):
Source/WTF:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* wtf/AutomaticThread.cpp:
(WTF::AutomaticThreadCondition::wait):
(WTF::AutomaticThreadCondition::waitFor):
(WTF::AutomaticThread::AutomaticThread):
* wtf/AutomaticThread.h:
* wtf/CheckedCondition.h:
* wtf/CheckedLock.h:
* wtf/Condition.h:
* wtf/Lock.cpp:
(WTF::UncheckedLock::lockSlow):
(WTF::UncheckedLock::unlockSlow):
(WTF::UncheckedLock::unlockFairlySlow):
(WTF::UncheckedLock::safepointSlow):
* wtf/Lock.h:
(WTF::WTF_ASSERTS_ACQUIRED_LOCK):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocator::MetaAllocator):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
* wtf/MetaAllocator.h:
* wtf/ParallelHelperPool.cpp:
(WTF::ParallelHelperPool::ParallelHelperPool):
* wtf/ParallelHelperPool.h:
* wtf/RecursiveLockAdapter.h:
* wtf/WorkerPool.cpp:
(WTF::WorkerPool::WorkerPool):
* wtf/WorkerPool.h:
Tools:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* TestWebKitAPI/Tests/WTF/CheckedConditionTest.cpp:
* TestWebKitAPI/Tests/WTF/Condition.cpp:
* TestWebKitAPI/Tests/WTF/MetaAllocator.cpp:
* WebKitTestRunner/InjectedBundle/AccessibilityController.cpp:
(WTR::AXThread::createThreadIfNeeded):
Canonical link: https://commits.webkit.org/238070@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277943 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-24 05:37:41 +00:00
|
|
|
{
|
|
|
|
m_lock.lock();
|
|
|
|
}
|
|
|
|
Locker(AdoptLockTag, Lock& lock) WTF_REQUIRES_LOCK(lock)
|
|
|
|
: m_lock(lock)
|
2021-05-26 00:37:19 +00:00
|
|
|
, m_isLocked(true)
|
Make CheckedLock the default Lock
https://bugs.webkit.org/show_bug.cgi?id=226157
Reviewed by Darin Adler.
Make CheckedLock the default Lock so that we get more benefits from Clang
Thread Safety Analysis. Note that CheckedLock 100% relies on the existing
Source/JavaScriptCore:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* assembler/testmasm.cpp:
* dfg/DFGCommon.cpp:
* dfg/DFGThreadData.h:
* dfg/DFGWorklist.cpp:
(JSC::DFG::Worklist::Worklist):
* dfg/DFGWorklist.h:
* dynbench.cpp:
* heap/BlockDirectory.h:
(JSC::BlockDirectory::bitvectorLock):
* heap/CodeBlockSet.h:
(JSC::CodeBlockSet::getLock):
* heap/Heap.cpp:
(JSC::Heap::Heap):
* heap/Heap.h:
* heap/MarkedSpace.h:
(JSC::MarkedSpace::directoryLock):
* heap/MarkingConstraintSolver.h:
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::donateKnownParallel):
* heap/SlotVisitor.h:
* jit/ExecutableAllocator.cpp:
(JSC::ExecutableAllocator::getLock const):
(JSC::dumpJITMemory):
* jit/ExecutableAllocator.h:
(JSC::ExecutableAllocatorBase::getLock const):
* jit/JITWorklist.cpp:
(JSC::JITWorklist::JITWorklist):
* jit/JITWorklist.h:
* jsc.cpp:
* profiler/ProfilerDatabase.h:
* runtime/ConcurrentJSLock.h:
* runtime/DeferredWorkTimer.h:
* runtime/JSLock.h:
* runtime/SamplingProfiler.cpp:
(JSC::FrameWalker::FrameWalker):
(JSC::CFrameWalker::CFrameWalker):
(JSC::SamplingProfiler::takeSample):
* runtime/SamplingProfiler.h:
(JSC::SamplingProfiler::getLock):
* runtime/VM.h:
* runtime/VMTraps.cpp:
(JSC::VMTraps::invalidateCodeBlocksOnStack):
(JSC::VMTraps::VMTraps):
* runtime/VMTraps.h:
* tools/FunctionOverrides.h:
* tools/VMInspector.cpp:
(JSC::ensureIsSafeToLock):
* tools/VMInspector.h:
(JSC::VMInspector::getLock):
* wasm/WasmCalleeRegistry.h:
(JSC::Wasm::CalleeRegistry::getLock):
* wasm/WasmPlan.h:
* wasm/WasmStreamingCompiler.h:
* wasm/WasmThunks.h:
* wasm/WasmWorklist.cpp:
(JSC::Wasm::Worklist::Worklist):
* wasm/WasmWorklist.h:
Source/WebCore:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* Modules/indexeddb/server/IDBServer.cpp:
* Modules/webaudio/MediaElementAudioSourceNode.h:
* Modules/webdatabase/OriginLock.cpp:
* bindings/js/JSDOMGlobalObject.h:
* dom/Node.cpp:
* html/HTMLMediaElement.cpp:
(WebCore::HTMLMediaElement::createMediaPlayer):
* html/canvas/WebGLContextGroup.cpp:
(WebCore::WebGLContextGroup::objectGraphLockForAContext):
* html/canvas/WebGLContextGroup.h:
* html/canvas/WebGLContextObject.cpp:
(WebCore::WebGLContextObject::objectGraphLockForContext):
* html/canvas/WebGLContextObject.h:
* html/canvas/WebGLObject.h:
* html/canvas/WebGLRenderingContextBase.cpp:
(WebCore::WebGLRenderingContextBase::objectGraphLock):
* html/canvas/WebGLRenderingContextBase.h:
* html/canvas/WebGLSharedObject.cpp:
(WebCore::WebGLSharedObject::objectGraphLockForContext):
* html/canvas/WebGLSharedObject.h:
* page/scrolling/mac/ScrollingTreeMac.h:
* platform/audio/ReverbConvolver.cpp:
(WebCore::ReverbConvolver::backgroundThreadEntry):
* platform/graphics/ShadowBlur.cpp:
(WebCore::ScratchBuffer::lock):
(WebCore::ShadowBlur::drawRectShadowWithTiling):
(WebCore::ShadowBlur::drawInsetShadowWithTiling):
* platform/graphics/gstreamer/VideoSinkGStreamer.cpp:
* platform/graphics/gstreamer/eme/WebKitCommonEncryptionDecryptorGStreamer.cpp:
Source/WebKit:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* GPUProcess/graphics/RemoteGraphicsContextGL.cpp:
(WebKit::RemoteGraphicsContextGL::paintPixelBufferToImageBuffer):
* NetworkProcess/IndexedDB/WebIDBServer.cpp:
* UIProcess/API/glib/IconDatabase.h:
* UIProcess/mac/WKPrintingView.mm:
(-[WKPrintingView knowsPageRange:]):
Source/WTF:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* wtf/AutomaticThread.cpp:
(WTF::AutomaticThreadCondition::wait):
(WTF::AutomaticThreadCondition::waitFor):
(WTF::AutomaticThread::AutomaticThread):
* wtf/AutomaticThread.h:
* wtf/CheckedCondition.h:
* wtf/CheckedLock.h:
* wtf/Condition.h:
* wtf/Lock.cpp:
(WTF::UncheckedLock::lockSlow):
(WTF::UncheckedLock::unlockSlow):
(WTF::UncheckedLock::unlockFairlySlow):
(WTF::UncheckedLock::safepointSlow):
* wtf/Lock.h:
(WTF::WTF_ASSERTS_ACQUIRED_LOCK):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocator::MetaAllocator):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
* wtf/MetaAllocator.h:
* wtf/ParallelHelperPool.cpp:
(WTF::ParallelHelperPool::ParallelHelperPool):
* wtf/ParallelHelperPool.h:
* wtf/RecursiveLockAdapter.h:
* wtf/WorkerPool.cpp:
(WTF::WorkerPool::WorkerPool):
* wtf/WorkerPool.h:
Tools:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* TestWebKitAPI/Tests/WTF/CheckedConditionTest.cpp:
* TestWebKitAPI/Tests/WTF/Condition.cpp:
* TestWebKitAPI/Tests/WTF/MetaAllocator.cpp:
* WebKitTestRunner/InjectedBundle/AccessibilityController.cpp:
(WTR::AXThread::createThreadIfNeeded):
Canonical link: https://commits.webkit.org/238070@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277943 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-24 05:37:41 +00:00
|
|
|
{
|
|
|
|
}
|
|
|
|
~Locker() WTF_RELEASES_LOCK()
|
|
|
|
{
|
2021-05-26 00:37:19 +00:00
|
|
|
if (m_isLocked)
|
|
|
|
m_lock.unlock();
|
|
|
|
}
|
|
|
|
void unlockEarly() WTF_RELEASES_LOCK()
|
|
|
|
{
|
|
|
|
ASSERT(m_isLocked);
|
|
|
|
m_isLocked = false;
|
Make CheckedLock the default Lock
https://bugs.webkit.org/show_bug.cgi?id=226157
Reviewed by Darin Adler.
Make CheckedLock the default Lock so that we get more benefits from Clang
Thread Safety Analysis. Note that CheckedLock 100% relies on the existing
Source/JavaScriptCore:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* assembler/testmasm.cpp:
* dfg/DFGCommon.cpp:
* dfg/DFGThreadData.h:
* dfg/DFGWorklist.cpp:
(JSC::DFG::Worklist::Worklist):
* dfg/DFGWorklist.h:
* dynbench.cpp:
* heap/BlockDirectory.h:
(JSC::BlockDirectory::bitvectorLock):
* heap/CodeBlockSet.h:
(JSC::CodeBlockSet::getLock):
* heap/Heap.cpp:
(JSC::Heap::Heap):
* heap/Heap.h:
* heap/MarkedSpace.h:
(JSC::MarkedSpace::directoryLock):
* heap/MarkingConstraintSolver.h:
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::donateKnownParallel):
* heap/SlotVisitor.h:
* jit/ExecutableAllocator.cpp:
(JSC::ExecutableAllocator::getLock const):
(JSC::dumpJITMemory):
* jit/ExecutableAllocator.h:
(JSC::ExecutableAllocatorBase::getLock const):
* jit/JITWorklist.cpp:
(JSC::JITWorklist::JITWorklist):
* jit/JITWorklist.h:
* jsc.cpp:
* profiler/ProfilerDatabase.h:
* runtime/ConcurrentJSLock.h:
* runtime/DeferredWorkTimer.h:
* runtime/JSLock.h:
* runtime/SamplingProfiler.cpp:
(JSC::FrameWalker::FrameWalker):
(JSC::CFrameWalker::CFrameWalker):
(JSC::SamplingProfiler::takeSample):
* runtime/SamplingProfiler.h:
(JSC::SamplingProfiler::getLock):
* runtime/VM.h:
* runtime/VMTraps.cpp:
(JSC::VMTraps::invalidateCodeBlocksOnStack):
(JSC::VMTraps::VMTraps):
* runtime/VMTraps.h:
* tools/FunctionOverrides.h:
* tools/VMInspector.cpp:
(JSC::ensureIsSafeToLock):
* tools/VMInspector.h:
(JSC::VMInspector::getLock):
* wasm/WasmCalleeRegistry.h:
(JSC::Wasm::CalleeRegistry::getLock):
* wasm/WasmPlan.h:
* wasm/WasmStreamingCompiler.h:
* wasm/WasmThunks.h:
* wasm/WasmWorklist.cpp:
(JSC::Wasm::Worklist::Worklist):
* wasm/WasmWorklist.h:
Source/WebCore:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* Modules/indexeddb/server/IDBServer.cpp:
* Modules/webaudio/MediaElementAudioSourceNode.h:
* Modules/webdatabase/OriginLock.cpp:
* bindings/js/JSDOMGlobalObject.h:
* dom/Node.cpp:
* html/HTMLMediaElement.cpp:
(WebCore::HTMLMediaElement::createMediaPlayer):
* html/canvas/WebGLContextGroup.cpp:
(WebCore::WebGLContextGroup::objectGraphLockForAContext):
* html/canvas/WebGLContextGroup.h:
* html/canvas/WebGLContextObject.cpp:
(WebCore::WebGLContextObject::objectGraphLockForContext):
* html/canvas/WebGLContextObject.h:
* html/canvas/WebGLObject.h:
* html/canvas/WebGLRenderingContextBase.cpp:
(WebCore::WebGLRenderingContextBase::objectGraphLock):
* html/canvas/WebGLRenderingContextBase.h:
* html/canvas/WebGLSharedObject.cpp:
(WebCore::WebGLSharedObject::objectGraphLockForContext):
* html/canvas/WebGLSharedObject.h:
* page/scrolling/mac/ScrollingTreeMac.h:
* platform/audio/ReverbConvolver.cpp:
(WebCore::ReverbConvolver::backgroundThreadEntry):
* platform/graphics/ShadowBlur.cpp:
(WebCore::ScratchBuffer::lock):
(WebCore::ShadowBlur::drawRectShadowWithTiling):
(WebCore::ShadowBlur::drawInsetShadowWithTiling):
* platform/graphics/gstreamer/VideoSinkGStreamer.cpp:
* platform/graphics/gstreamer/eme/WebKitCommonEncryptionDecryptorGStreamer.cpp:
Source/WebKit:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* GPUProcess/graphics/RemoteGraphicsContextGL.cpp:
(WebKit::RemoteGraphicsContextGL::paintPixelBufferToImageBuffer):
* NetworkProcess/IndexedDB/WebIDBServer.cpp:
* UIProcess/API/glib/IconDatabase.h:
* UIProcess/mac/WKPrintingView.mm:
(-[WKPrintingView knowsPageRange:]):
Source/WTF:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* wtf/AutomaticThread.cpp:
(WTF::AutomaticThreadCondition::wait):
(WTF::AutomaticThreadCondition::waitFor):
(WTF::AutomaticThread::AutomaticThread):
* wtf/AutomaticThread.h:
* wtf/CheckedCondition.h:
* wtf/CheckedLock.h:
* wtf/Condition.h:
* wtf/Lock.cpp:
(WTF::UncheckedLock::lockSlow):
(WTF::UncheckedLock::unlockSlow):
(WTF::UncheckedLock::unlockFairlySlow):
(WTF::UncheckedLock::safepointSlow):
* wtf/Lock.h:
(WTF::WTF_ASSERTS_ACQUIRED_LOCK):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocator::MetaAllocator):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
* wtf/MetaAllocator.h:
* wtf/ParallelHelperPool.cpp:
(WTF::ParallelHelperPool::ParallelHelperPool):
* wtf/ParallelHelperPool.h:
* wtf/RecursiveLockAdapter.h:
* wtf/WorkerPool.cpp:
(WTF::WorkerPool::WorkerPool):
* wtf/WorkerPool.h:
Tools:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* TestWebKitAPI/Tests/WTF/CheckedConditionTest.cpp:
* TestWebKitAPI/Tests/WTF/Condition.cpp:
* TestWebKitAPI/Tests/WTF/MetaAllocator.cpp:
* WebKitTestRunner/InjectedBundle/AccessibilityController.cpp:
(WTR::AXThread::createThreadIfNeeded):
Canonical link: https://commits.webkit.org/238070@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277943 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-24 05:37:41 +00:00
|
|
|
m_lock.unlock();
|
|
|
|
}
|
|
|
|
Locker(const Locker<Lock>&) = delete;
|
|
|
|
Locker& operator=(const Locker<Lock>&) = delete;
|
2021-05-25 23:19:19 +00:00
|
|
|
|
Make CheckedLock the default Lock
https://bugs.webkit.org/show_bug.cgi?id=226157
Reviewed by Darin Adler.
Make CheckedLock the default Lock so that we get more benefits from Clang
Thread Safety Analysis. Note that CheckedLock 100% relies on the existing
Source/JavaScriptCore:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* assembler/testmasm.cpp:
* dfg/DFGCommon.cpp:
* dfg/DFGThreadData.h:
* dfg/DFGWorklist.cpp:
(JSC::DFG::Worklist::Worklist):
* dfg/DFGWorklist.h:
* dynbench.cpp:
* heap/BlockDirectory.h:
(JSC::BlockDirectory::bitvectorLock):
* heap/CodeBlockSet.h:
(JSC::CodeBlockSet::getLock):
* heap/Heap.cpp:
(JSC::Heap::Heap):
* heap/Heap.h:
* heap/MarkedSpace.h:
(JSC::MarkedSpace::directoryLock):
* heap/MarkingConstraintSolver.h:
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::donateKnownParallel):
* heap/SlotVisitor.h:
* jit/ExecutableAllocator.cpp:
(JSC::ExecutableAllocator::getLock const):
(JSC::dumpJITMemory):
* jit/ExecutableAllocator.h:
(JSC::ExecutableAllocatorBase::getLock const):
* jit/JITWorklist.cpp:
(JSC::JITWorklist::JITWorklist):
* jit/JITWorklist.h:
* jsc.cpp:
* profiler/ProfilerDatabase.h:
* runtime/ConcurrentJSLock.h:
* runtime/DeferredWorkTimer.h:
* runtime/JSLock.h:
* runtime/SamplingProfiler.cpp:
(JSC::FrameWalker::FrameWalker):
(JSC::CFrameWalker::CFrameWalker):
(JSC::SamplingProfiler::takeSample):
* runtime/SamplingProfiler.h:
(JSC::SamplingProfiler::getLock):
* runtime/VM.h:
* runtime/VMTraps.cpp:
(JSC::VMTraps::invalidateCodeBlocksOnStack):
(JSC::VMTraps::VMTraps):
* runtime/VMTraps.h:
* tools/FunctionOverrides.h:
* tools/VMInspector.cpp:
(JSC::ensureIsSafeToLock):
* tools/VMInspector.h:
(JSC::VMInspector::getLock):
* wasm/WasmCalleeRegistry.h:
(JSC::Wasm::CalleeRegistry::getLock):
* wasm/WasmPlan.h:
* wasm/WasmStreamingCompiler.h:
* wasm/WasmThunks.h:
* wasm/WasmWorklist.cpp:
(JSC::Wasm::Worklist::Worklist):
* wasm/WasmWorklist.h:
Source/WebCore:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* Modules/indexeddb/server/IDBServer.cpp:
* Modules/webaudio/MediaElementAudioSourceNode.h:
* Modules/webdatabase/OriginLock.cpp:
* bindings/js/JSDOMGlobalObject.h:
* dom/Node.cpp:
* html/HTMLMediaElement.cpp:
(WebCore::HTMLMediaElement::createMediaPlayer):
* html/canvas/WebGLContextGroup.cpp:
(WebCore::WebGLContextGroup::objectGraphLockForAContext):
* html/canvas/WebGLContextGroup.h:
* html/canvas/WebGLContextObject.cpp:
(WebCore::WebGLContextObject::objectGraphLockForContext):
* html/canvas/WebGLContextObject.h:
* html/canvas/WebGLObject.h:
* html/canvas/WebGLRenderingContextBase.cpp:
(WebCore::WebGLRenderingContextBase::objectGraphLock):
* html/canvas/WebGLRenderingContextBase.h:
* html/canvas/WebGLSharedObject.cpp:
(WebCore::WebGLSharedObject::objectGraphLockForContext):
* html/canvas/WebGLSharedObject.h:
* page/scrolling/mac/ScrollingTreeMac.h:
* platform/audio/ReverbConvolver.cpp:
(WebCore::ReverbConvolver::backgroundThreadEntry):
* platform/graphics/ShadowBlur.cpp:
(WebCore::ScratchBuffer::lock):
(WebCore::ShadowBlur::drawRectShadowWithTiling):
(WebCore::ShadowBlur::drawInsetShadowWithTiling):
* platform/graphics/gstreamer/VideoSinkGStreamer.cpp:
* platform/graphics/gstreamer/eme/WebKitCommonEncryptionDecryptorGStreamer.cpp:
Source/WebKit:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* GPUProcess/graphics/RemoteGraphicsContextGL.cpp:
(WebKit::RemoteGraphicsContextGL::paintPixelBufferToImageBuffer):
* NetworkProcess/IndexedDB/WebIDBServer.cpp:
* UIProcess/API/glib/IconDatabase.h:
* UIProcess/mac/WKPrintingView.mm:
(-[WKPrintingView knowsPageRange:]):
Source/WTF:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* wtf/AutomaticThread.cpp:
(WTF::AutomaticThreadCondition::wait):
(WTF::AutomaticThreadCondition::waitFor):
(WTF::AutomaticThread::AutomaticThread):
* wtf/AutomaticThread.h:
* wtf/CheckedCondition.h:
* wtf/CheckedLock.h:
* wtf/Condition.h:
* wtf/Lock.cpp:
(WTF::UncheckedLock::lockSlow):
(WTF::UncheckedLock::unlockSlow):
(WTF::UncheckedLock::unlockFairlySlow):
(WTF::UncheckedLock::safepointSlow):
* wtf/Lock.h:
(WTF::WTF_ASSERTS_ACQUIRED_LOCK):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocator::MetaAllocator):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
* wtf/MetaAllocator.h:
* wtf/ParallelHelperPool.cpp:
(WTF::ParallelHelperPool::ParallelHelperPool):
* wtf/ParallelHelperPool.h:
* wtf/RecursiveLockAdapter.h:
* wtf/WorkerPool.cpp:
(WTF::WorkerPool::WorkerPool):
* wtf/WorkerPool.h:
Tools:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* TestWebKitAPI/Tests/WTF/CheckedConditionTest.cpp:
* TestWebKitAPI/Tests/WTF/Condition.cpp:
* TestWebKitAPI/Tests/WTF/MetaAllocator.cpp:
* WebKitTestRunner/InjectedBundle/AccessibilityController.cpp:
(WTR::AXThread::createThreadIfNeeded):
Canonical link: https://commits.webkit.org/238070@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277943 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-24 05:37:41 +00:00
|
|
|
private:
|
2021-05-25 23:19:19 +00:00
|
|
|
// Support DropLockForScope even though it doesn't support thread safety analysis.
|
|
|
|
template<typename>
|
|
|
|
friend class DropLockForScope;
|
|
|
|
|
|
|
|
void lock() WTF_ACQUIRES_LOCK(m_lock)
|
|
|
|
{
|
|
|
|
m_lock.lock();
|
|
|
|
compilerFence();
|
|
|
|
}
|
|
|
|
|
|
|
|
void unlock() WTF_RELEASES_LOCK(m_lock)
|
|
|
|
{
|
|
|
|
compilerFence();
|
|
|
|
m_lock.unlock();
|
|
|
|
}
|
|
|
|
|
Make CheckedLock the default Lock
https://bugs.webkit.org/show_bug.cgi?id=226157
Reviewed by Darin Adler.
Make CheckedLock the default Lock so that we get more benefits from Clang
Thread Safety Analysis. Note that CheckedLock 100% relies on the existing
Source/JavaScriptCore:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* assembler/testmasm.cpp:
* dfg/DFGCommon.cpp:
* dfg/DFGThreadData.h:
* dfg/DFGWorklist.cpp:
(JSC::DFG::Worklist::Worklist):
* dfg/DFGWorklist.h:
* dynbench.cpp:
* heap/BlockDirectory.h:
(JSC::BlockDirectory::bitvectorLock):
* heap/CodeBlockSet.h:
(JSC::CodeBlockSet::getLock):
* heap/Heap.cpp:
(JSC::Heap::Heap):
* heap/Heap.h:
* heap/MarkedSpace.h:
(JSC::MarkedSpace::directoryLock):
* heap/MarkingConstraintSolver.h:
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::donateKnownParallel):
* heap/SlotVisitor.h:
* jit/ExecutableAllocator.cpp:
(JSC::ExecutableAllocator::getLock const):
(JSC::dumpJITMemory):
* jit/ExecutableAllocator.h:
(JSC::ExecutableAllocatorBase::getLock const):
* jit/JITWorklist.cpp:
(JSC::JITWorklist::JITWorklist):
* jit/JITWorklist.h:
* jsc.cpp:
* profiler/ProfilerDatabase.h:
* runtime/ConcurrentJSLock.h:
* runtime/DeferredWorkTimer.h:
* runtime/JSLock.h:
* runtime/SamplingProfiler.cpp:
(JSC::FrameWalker::FrameWalker):
(JSC::CFrameWalker::CFrameWalker):
(JSC::SamplingProfiler::takeSample):
* runtime/SamplingProfiler.h:
(JSC::SamplingProfiler::getLock):
* runtime/VM.h:
* runtime/VMTraps.cpp:
(JSC::VMTraps::invalidateCodeBlocksOnStack):
(JSC::VMTraps::VMTraps):
* runtime/VMTraps.h:
* tools/FunctionOverrides.h:
* tools/VMInspector.cpp:
(JSC::ensureIsSafeToLock):
* tools/VMInspector.h:
(JSC::VMInspector::getLock):
* wasm/WasmCalleeRegistry.h:
(JSC::Wasm::CalleeRegistry::getLock):
* wasm/WasmPlan.h:
* wasm/WasmStreamingCompiler.h:
* wasm/WasmThunks.h:
* wasm/WasmWorklist.cpp:
(JSC::Wasm::Worklist::Worklist):
* wasm/WasmWorklist.h:
Source/WebCore:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* Modules/indexeddb/server/IDBServer.cpp:
* Modules/webaudio/MediaElementAudioSourceNode.h:
* Modules/webdatabase/OriginLock.cpp:
* bindings/js/JSDOMGlobalObject.h:
* dom/Node.cpp:
* html/HTMLMediaElement.cpp:
(WebCore::HTMLMediaElement::createMediaPlayer):
* html/canvas/WebGLContextGroup.cpp:
(WebCore::WebGLContextGroup::objectGraphLockForAContext):
* html/canvas/WebGLContextGroup.h:
* html/canvas/WebGLContextObject.cpp:
(WebCore::WebGLContextObject::objectGraphLockForContext):
* html/canvas/WebGLContextObject.h:
* html/canvas/WebGLObject.h:
* html/canvas/WebGLRenderingContextBase.cpp:
(WebCore::WebGLRenderingContextBase::objectGraphLock):
* html/canvas/WebGLRenderingContextBase.h:
* html/canvas/WebGLSharedObject.cpp:
(WebCore::WebGLSharedObject::objectGraphLockForContext):
* html/canvas/WebGLSharedObject.h:
* page/scrolling/mac/ScrollingTreeMac.h:
* platform/audio/ReverbConvolver.cpp:
(WebCore::ReverbConvolver::backgroundThreadEntry):
* platform/graphics/ShadowBlur.cpp:
(WebCore::ScratchBuffer::lock):
(WebCore::ShadowBlur::drawRectShadowWithTiling):
(WebCore::ShadowBlur::drawInsetShadowWithTiling):
* platform/graphics/gstreamer/VideoSinkGStreamer.cpp:
* platform/graphics/gstreamer/eme/WebKitCommonEncryptionDecryptorGStreamer.cpp:
Source/WebKit:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* GPUProcess/graphics/RemoteGraphicsContextGL.cpp:
(WebKit::RemoteGraphicsContextGL::paintPixelBufferToImageBuffer):
* NetworkProcess/IndexedDB/WebIDBServer.cpp:
* UIProcess/API/glib/IconDatabase.h:
* UIProcess/mac/WKPrintingView.mm:
(-[WKPrintingView knowsPageRange:]):
Source/WTF:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* wtf/AutomaticThread.cpp:
(WTF::AutomaticThreadCondition::wait):
(WTF::AutomaticThreadCondition::waitFor):
(WTF::AutomaticThread::AutomaticThread):
* wtf/AutomaticThread.h:
* wtf/CheckedCondition.h:
* wtf/CheckedLock.h:
* wtf/Condition.h:
* wtf/Lock.cpp:
(WTF::UncheckedLock::lockSlow):
(WTF::UncheckedLock::unlockSlow):
(WTF::UncheckedLock::unlockFairlySlow):
(WTF::UncheckedLock::safepointSlow):
* wtf/Lock.h:
(WTF::WTF_ASSERTS_ACQUIRED_LOCK):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocator::MetaAllocator):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
* wtf/MetaAllocator.h:
* wtf/ParallelHelperPool.cpp:
(WTF::ParallelHelperPool::ParallelHelperPool):
* wtf/ParallelHelperPool.h:
* wtf/RecursiveLockAdapter.h:
* wtf/WorkerPool.cpp:
(WTF::WorkerPool::WorkerPool):
* wtf/WorkerPool.h:
Tools:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* TestWebKitAPI/Tests/WTF/CheckedConditionTest.cpp:
* TestWebKitAPI/Tests/WTF/Condition.cpp:
* TestWebKitAPI/Tests/WTF/MetaAllocator.cpp:
* WebKitTestRunner/InjectedBundle/AccessibilityController.cpp:
(WTR::AXThread::createThreadIfNeeded):
Canonical link: https://commits.webkit.org/238070@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277943 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-24 05:37:41 +00:00
|
|
|
Lock& m_lock;
|
2021-05-26 00:37:19 +00:00
|
|
|
bool m_isLocked { false };
|
Make CheckedLock the default Lock
https://bugs.webkit.org/show_bug.cgi?id=226157
Reviewed by Darin Adler.
Make CheckedLock the default Lock so that we get more benefits from Clang
Thread Safety Analysis. Note that CheckedLock 100% relies on the existing
Source/JavaScriptCore:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* assembler/testmasm.cpp:
* dfg/DFGCommon.cpp:
* dfg/DFGThreadData.h:
* dfg/DFGWorklist.cpp:
(JSC::DFG::Worklist::Worklist):
* dfg/DFGWorklist.h:
* dynbench.cpp:
* heap/BlockDirectory.h:
(JSC::BlockDirectory::bitvectorLock):
* heap/CodeBlockSet.h:
(JSC::CodeBlockSet::getLock):
* heap/Heap.cpp:
(JSC::Heap::Heap):
* heap/Heap.h:
* heap/MarkedSpace.h:
(JSC::MarkedSpace::directoryLock):
* heap/MarkingConstraintSolver.h:
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::donateKnownParallel):
* heap/SlotVisitor.h:
* jit/ExecutableAllocator.cpp:
(JSC::ExecutableAllocator::getLock const):
(JSC::dumpJITMemory):
* jit/ExecutableAllocator.h:
(JSC::ExecutableAllocatorBase::getLock const):
* jit/JITWorklist.cpp:
(JSC::JITWorklist::JITWorklist):
* jit/JITWorklist.h:
* jsc.cpp:
* profiler/ProfilerDatabase.h:
* runtime/ConcurrentJSLock.h:
* runtime/DeferredWorkTimer.h:
* runtime/JSLock.h:
* runtime/SamplingProfiler.cpp:
(JSC::FrameWalker::FrameWalker):
(JSC::CFrameWalker::CFrameWalker):
(JSC::SamplingProfiler::takeSample):
* runtime/SamplingProfiler.h:
(JSC::SamplingProfiler::getLock):
* runtime/VM.h:
* runtime/VMTraps.cpp:
(JSC::VMTraps::invalidateCodeBlocksOnStack):
(JSC::VMTraps::VMTraps):
* runtime/VMTraps.h:
* tools/FunctionOverrides.h:
* tools/VMInspector.cpp:
(JSC::ensureIsSafeToLock):
* tools/VMInspector.h:
(JSC::VMInspector::getLock):
* wasm/WasmCalleeRegistry.h:
(JSC::Wasm::CalleeRegistry::getLock):
* wasm/WasmPlan.h:
* wasm/WasmStreamingCompiler.h:
* wasm/WasmThunks.h:
* wasm/WasmWorklist.cpp:
(JSC::Wasm::Worklist::Worklist):
* wasm/WasmWorklist.h:
Source/WebCore:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* Modules/indexeddb/server/IDBServer.cpp:
* Modules/webaudio/MediaElementAudioSourceNode.h:
* Modules/webdatabase/OriginLock.cpp:
* bindings/js/JSDOMGlobalObject.h:
* dom/Node.cpp:
* html/HTMLMediaElement.cpp:
(WebCore::HTMLMediaElement::createMediaPlayer):
* html/canvas/WebGLContextGroup.cpp:
(WebCore::WebGLContextGroup::objectGraphLockForAContext):
* html/canvas/WebGLContextGroup.h:
* html/canvas/WebGLContextObject.cpp:
(WebCore::WebGLContextObject::objectGraphLockForContext):
* html/canvas/WebGLContextObject.h:
* html/canvas/WebGLObject.h:
* html/canvas/WebGLRenderingContextBase.cpp:
(WebCore::WebGLRenderingContextBase::objectGraphLock):
* html/canvas/WebGLRenderingContextBase.h:
* html/canvas/WebGLSharedObject.cpp:
(WebCore::WebGLSharedObject::objectGraphLockForContext):
* html/canvas/WebGLSharedObject.h:
* page/scrolling/mac/ScrollingTreeMac.h:
* platform/audio/ReverbConvolver.cpp:
(WebCore::ReverbConvolver::backgroundThreadEntry):
* platform/graphics/ShadowBlur.cpp:
(WebCore::ScratchBuffer::lock):
(WebCore::ShadowBlur::drawRectShadowWithTiling):
(WebCore::ShadowBlur::drawInsetShadowWithTiling):
* platform/graphics/gstreamer/VideoSinkGStreamer.cpp:
* platform/graphics/gstreamer/eme/WebKitCommonEncryptionDecryptorGStreamer.cpp:
Source/WebKit:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* GPUProcess/graphics/RemoteGraphicsContextGL.cpp:
(WebKit::RemoteGraphicsContextGL::paintPixelBufferToImageBuffer):
* NetworkProcess/IndexedDB/WebIDBServer.cpp:
* UIProcess/API/glib/IconDatabase.h:
* UIProcess/mac/WKPrintingView.mm:
(-[WKPrintingView knowsPageRange:]):
Source/WTF:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* wtf/AutomaticThread.cpp:
(WTF::AutomaticThreadCondition::wait):
(WTF::AutomaticThreadCondition::waitFor):
(WTF::AutomaticThread::AutomaticThread):
* wtf/AutomaticThread.h:
* wtf/CheckedCondition.h:
* wtf/CheckedLock.h:
* wtf/Condition.h:
* wtf/Lock.cpp:
(WTF::UncheckedLock::lockSlow):
(WTF::UncheckedLock::unlockSlow):
(WTF::UncheckedLock::unlockFairlySlow):
(WTF::UncheckedLock::safepointSlow):
* wtf/Lock.h:
(WTF::WTF_ASSERTS_ACQUIRED_LOCK):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocator::MetaAllocator):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
* wtf/MetaAllocator.h:
* wtf/ParallelHelperPool.cpp:
(WTF::ParallelHelperPool::ParallelHelperPool):
* wtf/ParallelHelperPool.h:
* wtf/RecursiveLockAdapter.h:
* wtf/WorkerPool.cpp:
(WTF::WorkerPool::WorkerPool):
* wtf/WorkerPool.h:
Tools:
Lock implementation and merely adds the clang anotations for thread
safety.
That this patch does is:
1. Rename the Lock class to UncheckedLock
2. Rename the CheckedLock class to Lock
3. Rename the Condition class to UncheckedCondition
4. Rename the CheckedCondition class to Condition
5. Update the types of certain variables from Lock / Condition to
UncheckedLock / UncheckedCondition if I got a build failure. Build
failures are usually caused by the following facts:
- Locker<CheckedLock> doesn't subclass AbstractLocker which a lot of
JSC code passes as argument
- Locker<CheckedLock> has no move constructor
- Locker<CheckedLock> cannot be constructed from a lock pointer, only
a reference
For now, CheckedLock and CheckedCondition remain as aliases to Lock and
Condition, in their respective CheckedLock.h / CheckedCondition.h headers.
I will drop them in a follow-up to reduce patch size.
I will also follow-up to try and get rid of as much usage of UncheckedLock
and UncheckedCondition as possible. I did not try very hard in this patch
to reduce patch size.
* TestWebKitAPI/Tests/WTF/CheckedConditionTest.cpp:
* TestWebKitAPI/Tests/WTF/Condition.cpp:
* TestWebKitAPI/Tests/WTF/MetaAllocator.cpp:
* WebKitTestRunner/InjectedBundle/AccessibilityController.cpp:
(WTR::AXThread::createThreadIfNeeded):
Canonical link: https://commits.webkit.org/238070@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277943 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-24 05:37:41 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
Locker(Lock&) -> Locker<Lock>;
|
|
|
|
Locker(AdoptLockTag, Lock&) -> Locker<Lock>;
|
|
|
|
|
2017-12-07 03:52:09 +00:00
|
|
|
using LockHolder = Locker<Lock>;
|
Lightweight locks should be adaptive
https://bugs.webkit.org/show_bug.cgi?id=147545
Reviewed by Geoffrey Garen.
Source/JavaScriptCore:
* dfg/DFGCommon.cpp:
(JSC::DFG::startCrashing):
* heap/CopiedBlock.h:
(JSC::CopiedBlock::workListLock):
* heap/CopiedBlockInlines.h:
(JSC::CopiedBlock::shouldReportLiveBytes):
(JSC::CopiedBlock::reportLiveBytes):
* heap/CopiedSpace.cpp:
(JSC::CopiedSpace::doneFillingBlock):
* heap/CopiedSpace.h:
(JSC::CopiedSpace::CopiedGeneration::CopiedGeneration):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::recycleEvacuatedBlock):
* heap/GCThreadSharedData.cpp:
(JSC::GCThreadSharedData::didStartCopying):
* heap/GCThreadSharedData.h:
(JSC::GCThreadSharedData::getNextBlocksToCopy):
* heap/ListableHandler.h:
(JSC::ListableHandler::List::addThreadSafe):
(JSC::ListableHandler::List::addNotThreadSafe):
* heap/MachineStackMarker.cpp:
(JSC::MachineThreads::tryCopyOtherThreadStacks):
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::copyLater):
* parser/SourceProvider.cpp:
(JSC::SourceProvider::~SourceProvider):
(JSC::SourceProvider::getID):
* profiler/ProfilerDatabase.cpp:
(JSC::Profiler::Database::addDatabaseToAtExit):
(JSC::Profiler::Database::removeDatabaseFromAtExit):
(JSC::Profiler::Database::removeFirstAtExitDatabase):
* runtime/TypeProfilerLog.h:
Source/WebCore:
* bindings/objc/WebScriptObject.mm:
(WebCore::getJSWrapper):
(WebCore::addJSWrapper):
(WebCore::removeJSWrapper):
(WebCore::removeJSWrapperIfRetainCountOne):
* platform/audio/mac/CARingBuffer.cpp:
(WebCore::CARingBuffer::setCurrentFrameBounds):
(WebCore::CARingBuffer::getCurrentFrameBounds):
* platform/audio/mac/CARingBuffer.h:
* platform/ios/wak/WAKWindow.mm:
(-[WAKWindow setExposedScrollViewRect:]):
(-[WAKWindow exposedScrollViewRect]):
Source/WebKit2:
* WebProcess/WebPage/EventDispatcher.cpp:
(WebKit::EventDispatcher::clearQueuedTouchEventsForPage):
(WebKit::EventDispatcher::getQueuedTouchEventsForPage):
(WebKit::EventDispatcher::touchEvent):
(WebKit::EventDispatcher::dispatchTouchEvents):
* WebProcess/WebPage/EventDispatcher.h:
* WebProcess/WebPage/ViewUpdateDispatcher.cpp:
(WebKit::ViewUpdateDispatcher::visibleContentRectUpdate):
(WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate):
* WebProcess/WebPage/ViewUpdateDispatcher.h:
Source/WTF:
A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition
overhead is lower than system locks and because they take dramatically less space than system
locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock
acquire is up to 10x faster and under microcontention - short critical section with two or
more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4
bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit
should continue to avoid system locks - they are just far too slow and far too big.
But there is a problem with this idiom. System lock implementations will sleep a thread when
it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU.
In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for
microcontention, but awful when the lock will not be released for a while. In fact, when
critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is
almost 100x more than the CPU time cost of a system lock. This case doesn't arise too
frequently in our current uses of spinlocks, but that's probably because right now there are
places where we make a conscious decision to use system locks - even though they use more
memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a
while to acquire the lock.
The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new
concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks
that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The
idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held,
the slow path tries some number of spins to acquire the lock, and if that fails, the thread is
put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and
the lock itself is a tagged pointer: either it is just bits telling us the complete lock state
(not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the
lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path
if the lock is not contended, a short burst of spinning for microcontention, and a full-blown
queue for critical sections that are held for a long time.
On a locking microbenchmark, this new Lock exhibits the following performance
characteristics:
- Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster
than a system mutex.
- Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster
than a system mutex.
- CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a
SpinLock.
- Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than
a SpinLock.
This patch replaces all uses of SpinLock with Lock, since our critical sections are not
no-ops so if you do basically anything in your critical section, the Lock overhead will be
invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead
instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is
as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time.
This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits
of having a lock that just uses a byte are still better than the CPU wastage benefits of
Lock. But, this work will enable some future work to create locks that will fit in just 1.6
bits: https://bugs.webkit.org/show_bug.cgi?id=147665.
Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak
CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if
there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the
possibility of a null queue head.
[1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf
* WTF.vcxproj/WTF.vcxproj:
* WTF.xcodeproj/project.pbxproj:
* benchmarks: Added.
* benchmarks/LockSpeedTest.cpp: Added.
(main):
* wtf/Atomics.h:
(WTF::Atomic::compareExchangeWeak):
(WTF::Atomic::compareExchangeStrong):
* wtf/CMakeLists.txt:
* wtf/Lock.cpp: Added.
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
* wtf/Lock.h: Added.
(WTF::LockBase::lock):
(WTF::LockBase::unlock):
(WTF::LockBase::isHeld):
(WTF::LockBase::isLocked):
(WTF::Lock::Lock):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocatorHandle::shrink):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
(WTF::MetaAllocator::addFreshFreeSpace):
(WTF::MetaAllocator::debugFreeSpaceSize):
* wtf/MetaAllocator.h:
* wtf/SpinLock.h:
* wtf/ThreadingPthreads.cpp:
* wtf/ThreadingWin.cpp:
* wtf/text/AtomicString.cpp:
* wtf/text/AtomicStringImpl.cpp:
(WTF::AtomicStringTableLocker::AtomicStringTableLocker):
Tools:
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/Lock.cpp: Added.
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/165908@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
|
|
|
|
|
|
|
} // namespace WTF
|
|
|
|
|
|
|
|
using WTF::Lock;
|
|
|
|
using WTF::LockHolder;
|