haikuwebkit/Source/WebCore/contentextensions/DFAMinimizer.cpp

510 lines
20 KiB
C++
Raw Permalink Normal View History

Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
/*
* Copyright (C) 2015 Apple Inc. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS''
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
* THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR ITS CONTRIBUTORS
* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
* THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "config.h"
#include "DFAMinimizer.h"
#if ENABLE(CONTENT_EXTENSIONS)
Use less memory when compiling content extensions. https://bugs.webkit.org/show_bug.cgi?id=144051 Patch by Alex Christensen <achristensen@webkit.org> on 2015-04-23 Reviewed by Darin Adler and Benjamin Poulain. Source/WebCore: No change in functionality, correctness already covered by existing tests. Before this patch, a DFANode contained a HashSet of transitions. Large vectors of DFANodes made many small HashSets, which was inefficient use of memory. We now put all the actions and transitions into one big compact Vector and each node owns ranges in that vector. * contentextensions/CombinedURLFilters.cpp: (WebCore::ContentExtensions::recursiveMemoryUsed): (WebCore::ContentExtensions::CombinedURLFilters::memoryUsed): * contentextensions/CombinedURLFilters.h: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/ContentExtensionsDebugging.h: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): (WebCore::ContentExtensions::DFANode::changeFallbackTransition): (WebCore::ContentExtensions::DFANode::addFallbackTransition): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::DFA): Deleted. (WebCore::ContentExtensions::DFA::operator=): Deleted. * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): (WebCore::ContentExtensions::DFABytecodeCompiler::compile): * contentextensions/DFABytecodeCompiler.h: * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: * contentextensions/DFANode.h: (WebCore::ContentExtensions::DFANode::isKilled): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): (WebCore::ContentExtensions::DFANode::hasActions): (WebCore::ContentExtensions::DFANode::transitionsLength): (WebCore::ContentExtensions::DFANode::actionsLength): (WebCore::ContentExtensions::DFANode::actionsStart): (WebCore::ContentExtensions::DFANode::setActions): (WebCore::ContentExtensions::DFANode::setTransitions): (WebCore::ContentExtensions::DFANode::resetTransitions): (WebCore::ContentExtensions::DFANode::transitionsStart): (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): * contentextensions/NFA.cpp: (WebCore::ContentExtensions::NFA::memoryUsed): * contentextensions/NFA.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetSource::NodeIdSetToUniqueNodeIdSetSource): (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetTranslator::translate): (WebCore::ContentExtensions::getOrCreateDFANode): (WebCore::ContentExtensions::NFAToDFA::convert): Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: (TestWebKitAPI::TEST_F): * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: (TestWebKitAPI::countLiveNodes): Canonical link: https://commits.webkit.org/162080@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@183204 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-23 19:56:57 +00:00
#include "DFA.h"
#include "DFANode.h"
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
#include "MutableRangeList.h"
Use less memory when compiling content extensions. https://bugs.webkit.org/show_bug.cgi?id=144051 Patch by Alex Christensen <achristensen@webkit.org> on 2015-04-23 Reviewed by Darin Adler and Benjamin Poulain. Source/WebCore: No change in functionality, correctness already covered by existing tests. Before this patch, a DFANode contained a HashSet of transitions. Large vectors of DFANodes made many small HashSets, which was inefficient use of memory. We now put all the actions and transitions into one big compact Vector and each node owns ranges in that vector. * contentextensions/CombinedURLFilters.cpp: (WebCore::ContentExtensions::recursiveMemoryUsed): (WebCore::ContentExtensions::CombinedURLFilters::memoryUsed): * contentextensions/CombinedURLFilters.h: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/ContentExtensionsDebugging.h: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): (WebCore::ContentExtensions::DFANode::changeFallbackTransition): (WebCore::ContentExtensions::DFANode::addFallbackTransition): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::DFA): Deleted. (WebCore::ContentExtensions::DFA::operator=): Deleted. * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): (WebCore::ContentExtensions::DFABytecodeCompiler::compile): * contentextensions/DFABytecodeCompiler.h: * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: * contentextensions/DFANode.h: (WebCore::ContentExtensions::DFANode::isKilled): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): (WebCore::ContentExtensions::DFANode::hasActions): (WebCore::ContentExtensions::DFANode::transitionsLength): (WebCore::ContentExtensions::DFANode::actionsLength): (WebCore::ContentExtensions::DFANode::actionsStart): (WebCore::ContentExtensions::DFANode::setActions): (WebCore::ContentExtensions::DFANode::setTransitions): (WebCore::ContentExtensions::DFANode::resetTransitions): (WebCore::ContentExtensions::DFANode::transitionsStart): (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): * contentextensions/NFA.cpp: (WebCore::ContentExtensions::NFA::memoryUsed): * contentextensions/NFA.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetSource::NodeIdSetToUniqueNodeIdSetSource): (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetTranslator::translate): (WebCore::ContentExtensions::getOrCreateDFANode): (WebCore::ContentExtensions::NFAToDFA::convert): Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: (TestWebKitAPI::TEST_F): * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: (TestWebKitAPI::countLiveNodes): Canonical link: https://commits.webkit.org/162080@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@183204 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-23 19:56:57 +00:00
#include <wtf/HashMap.h>
#include <wtf/Hasher.h>
Use less memory when compiling content extensions. https://bugs.webkit.org/show_bug.cgi?id=144051 Patch by Alex Christensen <achristensen@webkit.org> on 2015-04-23 Reviewed by Darin Adler and Benjamin Poulain. Source/WebCore: No change in functionality, correctness already covered by existing tests. Before this patch, a DFANode contained a HashSet of transitions. Large vectors of DFANodes made many small HashSets, which was inefficient use of memory. We now put all the actions and transitions into one big compact Vector and each node owns ranges in that vector. * contentextensions/CombinedURLFilters.cpp: (WebCore::ContentExtensions::recursiveMemoryUsed): (WebCore::ContentExtensions::CombinedURLFilters::memoryUsed): * contentextensions/CombinedURLFilters.h: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/ContentExtensionsDebugging.h: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): (WebCore::ContentExtensions::DFANode::changeFallbackTransition): (WebCore::ContentExtensions::DFANode::addFallbackTransition): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::DFA): Deleted. (WebCore::ContentExtensions::DFA::operator=): Deleted. * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): (WebCore::ContentExtensions::DFABytecodeCompiler::compile): * contentextensions/DFABytecodeCompiler.h: * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: * contentextensions/DFANode.h: (WebCore::ContentExtensions::DFANode::isKilled): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): (WebCore::ContentExtensions::DFANode::hasActions): (WebCore::ContentExtensions::DFANode::transitionsLength): (WebCore::ContentExtensions::DFANode::actionsLength): (WebCore::ContentExtensions::DFANode::actionsStart): (WebCore::ContentExtensions::DFANode::setActions): (WebCore::ContentExtensions::DFANode::setTransitions): (WebCore::ContentExtensions::DFANode::resetTransitions): (WebCore::ContentExtensions::DFANode::transitionsStart): (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): * contentextensions/NFA.cpp: (WebCore::ContentExtensions::NFA::memoryUsed): * contentextensions/NFA.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetSource::NodeIdSetToUniqueNodeIdSetSource): (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetTranslator::translate): (WebCore::ContentExtensions::getOrCreateDFANode): (WebCore::ContentExtensions::NFAToDFA::convert): Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: (TestWebKitAPI::TEST_F): * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: (TestWebKitAPI::countLiveNodes): Canonical link: https://commits.webkit.org/162080@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@183204 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-23 19:56:57 +00:00
#include <wtf/Vector.h>
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
namespace WebCore {
namespace ContentExtensions {
namespace {
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
template<typename VectorType, typename Iterable, typename Function>
static inline void iterateIntersections(const VectorType& singularTransitionsFirsts, const Iterable& iterableTransitionList, const Function& intersectionHandler)
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
{
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
ASSERT(!singularTransitionsFirsts.isEmpty());
auto otherIterator = iterableTransitionList.begin();
auto otherEnd = iterableTransitionList.end();
if (otherIterator == otherEnd)
return;
unsigned singularTransitionsLength = singularTransitionsFirsts.size();
unsigned singularTransitionsFirstsIndex = 0;
for (; otherIterator != otherEnd; ++otherIterator) {
auto firstCharacter = otherIterator.first();
while (singularTransitionsFirstsIndex < singularTransitionsLength
&& singularTransitionsFirsts[singularTransitionsFirstsIndex] != firstCharacter)
++singularTransitionsFirstsIndex;
intersectionHandler(singularTransitionsFirstsIndex, otherIterator);
++singularTransitionsFirstsIndex;
auto lastCharacter = otherIterator.last();
while (singularTransitionsFirstsIndex < singularTransitionsLength
&& singularTransitionsFirsts[singularTransitionsFirstsIndex] <= lastCharacter) {
intersectionHandler(singularTransitionsFirstsIndex, otherIterator);
++singularTransitionsFirstsIndex;
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
}
}
}
class Partition {
public:
void initialize(unsigned size)
{
if (!size)
return;
m_sets.reserveInitialCapacity(size);
m_partitionedElements.resize(size);
m_elementPositionInPartitionedNodes.resize(size);
m_elementToSetMap.resize(size);
for (unsigned i = 0; i < size; ++i) {
m_partitionedElements[i] = i;
m_elementPositionInPartitionedNodes[i] = i;
m_elementToSetMap[i] = 0;
}
Use Vector::uncheckedAppend() in more places https://bugs.webkit.org/show_bug.cgi?id=164952 Reviewed by Darin Adler. We can use Vector::uncheckedAppend() whenever the number of items that will be appended to a vector is equal to or less than its capacity. Using Vector::uncheckedAppend() is more efficient than Vector::append() as it avoids checking the capacity of the vector before appending a value to the end of it. * bindings/js/SerializedScriptValue.cpp: (WebCore::SerializedScriptValue::SerializedScriptValue): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::extractJumpTable): * contentextensions/DFAMinimizer.cpp: * css/StyleProperties.cpp: (WebCore::StyleProperties::copyPropertiesInSet): Use Vector::uncheckedAppend() and inline the assignment of the temporary variable value into the if condition to limit its scope as it is referenced exactly once in the loop body. * css/StyleRule.cpp: (WebCore::StyleRuleGroup::StyleRuleGroup): Use Vector::uncheckedAppend() and write for-loop using a C++11 range-based for-loop. * css/parser/CSSParserValues.cpp: (WebCore::CSSParserSelector::setLangArgumentList): * cssjit/SelectorCompiler.cpp: (WebCore::SelectorCompiler::SelectorCodeGenerator::generateElementIsNthLastChildOf): * dom/DocumentMarkerController.cpp: (WebCore::updateRenderedRectsForMarker): * page/scrolling/mac/ScrollingTreeFrameScrollingNodeMac.mm: (WebCore::convertToLayoutUnits): * platform/audio/AudioBus.cpp: (WebCore::AudioBus::AudioBus): * platform/graphics/FontCache.cpp: (WebCore::FontCache::purgeInactiveFontData): * platform/graphics/PathUtilities.cpp: (WebCore::polygonsForRect): * platform/mediastream/CaptureDeviceManager.cpp: (CaptureDeviceManager::bestSourcesForTypeAndConstraints): * platform/mediastream/RealtimeMediaSource.cpp: (WebCore::RealtimeMediaSource::fitnessDistance): * platform/network/cf/FormDataStreamCFNet.cpp: (WebCore::formCreate): Canonical link: https://commits.webkit.org/183067@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@209400 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-12-06 17:42:37 +00:00
m_sets.uncheckedAppend(SetDescriptor { 0, size, 0 });
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
}
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
void reserveUninitializedCapacity(unsigned elementCount)
{
m_partitionedElements.resize(elementCount);
m_elementPositionInPartitionedNodes.resize(elementCount);
m_elementToSetMap.resize(elementCount);
}
void addSetUnchecked(unsigned start, unsigned size)
{
m_sets.append(SetDescriptor { start, size, 0 });
}
void setElementUnchecked(unsigned elementIndex, unsigned positionInPartition, unsigned setIndex)
{
ASSERT(setIndex < m_sets.size());
m_partitionedElements[positionInPartition] = elementIndex;
m_elementPositionInPartitionedNodes[elementIndex] = positionInPartition;
m_elementToSetMap[elementIndex] = setIndex;
}
unsigned startOffsetOfSet(unsigned setIndex) const
{
return m_sets[setIndex].start;
}
ALWAYS_INLINE void markElementInCurrentGeneration(unsigned elementIndex)
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
{
// Swap the node with the first unmarked node.
unsigned setIndex = m_elementToSetMap[elementIndex];
SetDescriptor& setDescriptor = m_sets[setIndex];
unsigned elementPositionInPartition = m_elementPositionInPartitionedNodes[elementIndex];
ASSERT(elementPositionInPartition >= setDescriptor.start);
ASSERT(elementPositionInPartition < setDescriptor.end());
unsigned firstUnmarkedElementPositionInPartition = setDescriptor.indexAfterMarkedElements();
ASSERT(firstUnmarkedElementPositionInPartition >= setDescriptor.start && firstUnmarkedElementPositionInPartition < setDescriptor.end());
ASSERT(firstUnmarkedElementPositionInPartition >= firstUnmarkedElementPositionInPartition);
// Swap the nodes in the set.
unsigned firstUnmarkedElement = m_partitionedElements[firstUnmarkedElementPositionInPartition];
m_partitionedElements[firstUnmarkedElementPositionInPartition] = elementIndex;
m_partitionedElements[elementPositionInPartition] = firstUnmarkedElement;
// Update their index.
m_elementPositionInPartitionedNodes[elementIndex] = firstUnmarkedElementPositionInPartition;
m_elementPositionInPartitionedNodes[firstUnmarkedElement] = elementPositionInPartition;
if (!setDescriptor.markedCount) {
ASSERT(!m_setsMarkedInCurrentGeneration.contains(setIndex));
m_setsMarkedInCurrentGeneration.append(setIndex);
}
++setDescriptor.markedCount;
}
// The function passed as argument MUST not modify the partition.
template<typename Function>
void refineGeneration(const Function& function)
{
for (unsigned setIndex : m_setsMarkedInCurrentGeneration) {
SetDescriptor& setDescriptor = m_sets[setIndex];
if (setDescriptor.markedCount == setDescriptor.size) {
// Everything is marked, there is nothing to refine.
setDescriptor.markedCount = 0;
continue;
}
SetDescriptor newSet;
bool newSetIsMarkedSet = setDescriptor.markedCount * 2 <= setDescriptor.size;
if (newSetIsMarkedSet) {
// Less than half of the nodes have been marked.
newSet = { setDescriptor.start, setDescriptor.markedCount, 0 };
setDescriptor.start = setDescriptor.start + setDescriptor.markedCount;
} else
newSet = { setDescriptor.start + setDescriptor.markedCount, setDescriptor.size - setDescriptor.markedCount, 0 };
setDescriptor.size -= newSet.size;
setDescriptor.markedCount = 0;
unsigned newSetIndex = m_sets.size();
m_sets.append(newSet);
for (unsigned i = newSet.start; i < newSet.end(); ++i)
m_elementToSetMap[m_partitionedElements[i]] = newSetIndex;
function(newSetIndex);
}
m_setsMarkedInCurrentGeneration.clear();
}
// Call Function() on every node of a given subset.
template<typename Function>
void iterateSet(unsigned setIndex, const Function& function)
{
SetDescriptor& setDescriptor = m_sets[setIndex];
for (unsigned i = setDescriptor.start; i < setDescriptor.end(); ++i)
function(m_partitionedElements[i]);
}
// Index of the set containing the Node.
unsigned setIndex(unsigned elementIndex) const
{
return m_elementToSetMap[elementIndex];
}
// NodeIndex of the first element in the set.
unsigned firstElementInSet(unsigned setIndex) const
{
return m_partitionedElements[m_sets[setIndex].start];
}
unsigned size() const
{
return m_sets.size();
}
private:
struct SetDescriptor {
unsigned start;
unsigned size;
unsigned markedCount;
unsigned indexAfterMarkedElements() const { return start + markedCount; }
unsigned end() const { return start + size; }
};
// List of sets.
Vector<SetDescriptor, 0, ContentExtensionsOverflowHandler> m_sets;
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
// All the element indices such that two elements of the same set never have a element of a different set between them.
Vector<unsigned, 0, ContentExtensionsOverflowHandler> m_partitionedElements;
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
// Map elementIndex->position in the partitionedElements.
Vector<unsigned, 0, ContentExtensionsOverflowHandler> m_elementPositionInPartitionedNodes;
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
// Map elementIndex->SetIndex.
Vector<unsigned, 0, ContentExtensionsOverflowHandler> m_elementToSetMap;
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
// List of sets with any marked node. Each set can appear at most once.
// FIXME: find a good inline size for this.
Vector<unsigned, 128, ContentExtensionsOverflowHandler> m_setsMarkedInCurrentGeneration;
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
};
class FullGraphPartition {
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
typedef MutableRangeList<char, uint32_t, 128> SingularTransitionsMutableRangeList;
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
public:
Use less memory when compiling content extensions. https://bugs.webkit.org/show_bug.cgi?id=144051 Patch by Alex Christensen <achristensen@webkit.org> on 2015-04-23 Reviewed by Darin Adler and Benjamin Poulain. Source/WebCore: No change in functionality, correctness already covered by existing tests. Before this patch, a DFANode contained a HashSet of transitions. Large vectors of DFANodes made many small HashSets, which was inefficient use of memory. We now put all the actions and transitions into one big compact Vector and each node owns ranges in that vector. * contentextensions/CombinedURLFilters.cpp: (WebCore::ContentExtensions::recursiveMemoryUsed): (WebCore::ContentExtensions::CombinedURLFilters::memoryUsed): * contentextensions/CombinedURLFilters.h: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/ContentExtensionsDebugging.h: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): (WebCore::ContentExtensions::DFANode::changeFallbackTransition): (WebCore::ContentExtensions::DFANode::addFallbackTransition): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::DFA): Deleted. (WebCore::ContentExtensions::DFA::operator=): Deleted. * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): (WebCore::ContentExtensions::DFABytecodeCompiler::compile): * contentextensions/DFABytecodeCompiler.h: * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: * contentextensions/DFANode.h: (WebCore::ContentExtensions::DFANode::isKilled): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): (WebCore::ContentExtensions::DFANode::hasActions): (WebCore::ContentExtensions::DFANode::transitionsLength): (WebCore::ContentExtensions::DFANode::actionsLength): (WebCore::ContentExtensions::DFANode::actionsStart): (WebCore::ContentExtensions::DFANode::setActions): (WebCore::ContentExtensions::DFANode::setTransitions): (WebCore::ContentExtensions::DFANode::resetTransitions): (WebCore::ContentExtensions::DFANode::transitionsStart): (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): * contentextensions/NFA.cpp: (WebCore::ContentExtensions::NFA::memoryUsed): * contentextensions/NFA.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetSource::NodeIdSetToUniqueNodeIdSetSource): (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetTranslator::translate): (WebCore::ContentExtensions::getOrCreateDFANode): (WebCore::ContentExtensions::NFAToDFA::convert): Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: (TestWebKitAPI::TEST_F): * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: (TestWebKitAPI::countLiveNodes): Canonical link: https://commits.webkit.org/162080@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@183204 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-23 19:56:57 +00:00
FullGraphPartition(const DFA& dfa)
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
{
Use less memory when compiling content extensions. https://bugs.webkit.org/show_bug.cgi?id=144051 Patch by Alex Christensen <achristensen@webkit.org> on 2015-04-23 Reviewed by Darin Adler and Benjamin Poulain. Source/WebCore: No change in functionality, correctness already covered by existing tests. Before this patch, a DFANode contained a HashSet of transitions. Large vectors of DFANodes made many small HashSets, which was inefficient use of memory. We now put all the actions and transitions into one big compact Vector and each node owns ranges in that vector. * contentextensions/CombinedURLFilters.cpp: (WebCore::ContentExtensions::recursiveMemoryUsed): (WebCore::ContentExtensions::CombinedURLFilters::memoryUsed): * contentextensions/CombinedURLFilters.h: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/ContentExtensionsDebugging.h: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): (WebCore::ContentExtensions::DFANode::changeFallbackTransition): (WebCore::ContentExtensions::DFANode::addFallbackTransition): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::DFA): Deleted. (WebCore::ContentExtensions::DFA::operator=): Deleted. * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): (WebCore::ContentExtensions::DFABytecodeCompiler::compile): * contentextensions/DFABytecodeCompiler.h: * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: * contentextensions/DFANode.h: (WebCore::ContentExtensions::DFANode::isKilled): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): (WebCore::ContentExtensions::DFANode::hasActions): (WebCore::ContentExtensions::DFANode::transitionsLength): (WebCore::ContentExtensions::DFANode::actionsLength): (WebCore::ContentExtensions::DFANode::actionsStart): (WebCore::ContentExtensions::DFANode::setActions): (WebCore::ContentExtensions::DFANode::setTransitions): (WebCore::ContentExtensions::DFANode::resetTransitions): (WebCore::ContentExtensions::DFANode::transitionsStart): (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): * contentextensions/NFA.cpp: (WebCore::ContentExtensions::NFA::memoryUsed): * contentextensions/NFA.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetSource::NodeIdSetToUniqueNodeIdSetSource): (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetTranslator::translate): (WebCore::ContentExtensions::getOrCreateDFANode): (WebCore::ContentExtensions::NFAToDFA::convert): Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: (TestWebKitAPI::TEST_F): * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: (TestWebKitAPI::countLiveNodes): Canonical link: https://commits.webkit.org/162080@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@183204 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-23 19:56:57 +00:00
m_nodePartition.initialize(dfa.nodes.size());
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
SingularTransitionsMutableRangeList singularTransitions;
CounterConverter counterConverter;
for (const DFANode& node : dfa.nodes) {
if (node.isKilled())
continue;
auto transitions = node.transitions(dfa);
singularTransitions.extend(transitions.begin(), transitions.end(), counterConverter);
}
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
// Count the number of transition for each singular range. This will give us the bucket size
// for the transition partition, where transitions are partitioned by "symbol".
unsigned rangeIndexAccumulator = 0;
for (const auto& transition : singularTransitions) {
m_transitionPartition.addSetUnchecked(rangeIndexAccumulator, transition.data);
rangeIndexAccumulator += transition.data;
}
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
// Count the number of incoming transitions per node.
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
m_flattenedTransitionsStartOffsetPerNode.resize(dfa.nodes.size());
memset(m_flattenedTransitionsStartOffsetPerNode.data(), 0, m_flattenedTransitionsStartOffsetPerNode.size() * sizeof(unsigned));
Vector<char, 0, ContentExtensionsOverflowHandler> singularTransitionsFirsts;
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
singularTransitionsFirsts.reserveInitialCapacity(singularTransitions.m_ranges.size());
for (const auto& transition : singularTransitions)
singularTransitionsFirsts.uncheckedAppend(transition.first);
for (const DFANode& node : dfa.nodes) {
if (node.isKilled())
continue;
auto transitions = node.transitions(dfa);
iterateIntersections(singularTransitionsFirsts, transitions, [&](unsigned, const DFANode::ConstRangeIterator& origin) {
uint32_t targetNodeIndex = origin.target();
++m_flattenedTransitionsStartOffsetPerNode[targetNodeIndex];
});
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
}
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
// Accumulate the offsets. This gives us the start position of each bucket.
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
unsigned transitionAccumulator = 0;
for (unsigned i = 0; i < m_flattenedTransitionsStartOffsetPerNode.size(); ++i) {
unsigned transitionsCountForNode = m_flattenedTransitionsStartOffsetPerNode[i];
m_flattenedTransitionsStartOffsetPerNode[i] = transitionAccumulator;
transitionAccumulator += transitionsCountForNode;
}
unsigned flattenedTransitionsSize = transitionAccumulator;
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
ASSERT_WITH_MESSAGE(flattenedTransitionsSize == rangeIndexAccumulator, "The number of transitions should be the same, regardless of how they are arranged in buckets.");
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
m_transitionPartition.reserveUninitializedCapacity(flattenedTransitionsSize);
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
// Next, let's fill the transition table and set up the size of each group at the same time.
Use less memory when compiling content extensions. https://bugs.webkit.org/show_bug.cgi?id=144051 Patch by Alex Christensen <achristensen@webkit.org> on 2015-04-23 Reviewed by Darin Adler and Benjamin Poulain. Source/WebCore: No change in functionality, correctness already covered by existing tests. Before this patch, a DFANode contained a HashSet of transitions. Large vectors of DFANodes made many small HashSets, which was inefficient use of memory. We now put all the actions and transitions into one big compact Vector and each node owns ranges in that vector. * contentextensions/CombinedURLFilters.cpp: (WebCore::ContentExtensions::recursiveMemoryUsed): (WebCore::ContentExtensions::CombinedURLFilters::memoryUsed): * contentextensions/CombinedURLFilters.h: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/ContentExtensionsDebugging.h: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): (WebCore::ContentExtensions::DFANode::changeFallbackTransition): (WebCore::ContentExtensions::DFANode::addFallbackTransition): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::DFA): Deleted. (WebCore::ContentExtensions::DFA::operator=): Deleted. * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): (WebCore::ContentExtensions::DFABytecodeCompiler::compile): * contentextensions/DFABytecodeCompiler.h: * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: * contentextensions/DFANode.h: (WebCore::ContentExtensions::DFANode::isKilled): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): (WebCore::ContentExtensions::DFANode::hasActions): (WebCore::ContentExtensions::DFANode::transitionsLength): (WebCore::ContentExtensions::DFANode::actionsLength): (WebCore::ContentExtensions::DFANode::actionsStart): (WebCore::ContentExtensions::DFANode::setActions): (WebCore::ContentExtensions::DFANode::setTransitions): (WebCore::ContentExtensions::DFANode::resetTransitions): (WebCore::ContentExtensions::DFANode::transitionsStart): (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): * contentextensions/NFA.cpp: (WebCore::ContentExtensions::NFA::memoryUsed): * contentextensions/NFA.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetSource::NodeIdSetToUniqueNodeIdSetSource): (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetTranslator::translate): (WebCore::ContentExtensions::getOrCreateDFANode): (WebCore::ContentExtensions::NFAToDFA::convert): Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: (TestWebKitAPI::TEST_F): * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: (TestWebKitAPI::countLiveNodes): Canonical link: https://commits.webkit.org/162080@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@183204 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-23 19:56:57 +00:00
m_flattenedTransitionsSizePerNode.resize(dfa.nodes.size());
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
for (unsigned& counter : m_flattenedTransitionsSizePerNode)
counter = 0;
m_flattenedTransitions.resize(flattenedTransitionsSize);
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
Vector<uint32_t> transitionPerRangeOffset(m_transitionPartition.size());
memset(transitionPerRangeOffset.data(), 0, transitionPerRangeOffset.size() * sizeof(uint32_t));
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
Use less memory when compiling content extensions. https://bugs.webkit.org/show_bug.cgi?id=144051 Patch by Alex Christensen <achristensen@webkit.org> on 2015-04-23 Reviewed by Darin Adler and Benjamin Poulain. Source/WebCore: No change in functionality, correctness already covered by existing tests. Before this patch, a DFANode contained a HashSet of transitions. Large vectors of DFANodes made many small HashSets, which was inefficient use of memory. We now put all the actions and transitions into one big compact Vector and each node owns ranges in that vector. * contentextensions/CombinedURLFilters.cpp: (WebCore::ContentExtensions::recursiveMemoryUsed): (WebCore::ContentExtensions::CombinedURLFilters::memoryUsed): * contentextensions/CombinedURLFilters.h: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/ContentExtensionsDebugging.h: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): (WebCore::ContentExtensions::DFANode::changeFallbackTransition): (WebCore::ContentExtensions::DFANode::addFallbackTransition): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::DFA): Deleted. (WebCore::ContentExtensions::DFA::operator=): Deleted. * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): (WebCore::ContentExtensions::DFABytecodeCompiler::compile): * contentextensions/DFABytecodeCompiler.h: * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: * contentextensions/DFANode.h: (WebCore::ContentExtensions::DFANode::isKilled): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): (WebCore::ContentExtensions::DFANode::hasActions): (WebCore::ContentExtensions::DFANode::transitionsLength): (WebCore::ContentExtensions::DFANode::actionsLength): (WebCore::ContentExtensions::DFANode::actionsStart): (WebCore::ContentExtensions::DFANode::setActions): (WebCore::ContentExtensions::DFANode::setTransitions): (WebCore::ContentExtensions::DFANode::resetTransitions): (WebCore::ContentExtensions::DFANode::transitionsStart): (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): * contentextensions/NFA.cpp: (WebCore::ContentExtensions::NFA::memoryUsed): * contentextensions/NFA.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetSource::NodeIdSetToUniqueNodeIdSetSource): (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetTranslator::translate): (WebCore::ContentExtensions::getOrCreateDFANode): (WebCore::ContentExtensions::NFAToDFA::convert): Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: (TestWebKitAPI::TEST_F): * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: (TestWebKitAPI::countLiveNodes): Canonical link: https://commits.webkit.org/162080@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@183204 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-23 19:56:57 +00:00
for (unsigned i = 0; i < dfa.nodes.size(); ++i) {
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
const DFANode& node = dfa.nodes[i];
if (node.isKilled())
continue;
auto transitions = node.transitions(dfa);
iterateIntersections(singularTransitionsFirsts, transitions, [&](unsigned singularTransitonIndex, const DFANode::ConstRangeIterator& origin) {
uint32_t targetNodeIndex = origin.target();
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
unsigned start = m_flattenedTransitionsStartOffsetPerNode[targetNodeIndex];
unsigned offset = m_flattenedTransitionsSizePerNode[targetNodeIndex];
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
unsigned positionInFlattenedTransitions = start + offset;
m_flattenedTransitions[positionInFlattenedTransitions] = Transition({ i });
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
uint32_t& inRangeOffset = transitionPerRangeOffset[singularTransitonIndex];
unsigned positionInTransitionPartition = m_transitionPartition.startOffsetOfSet(singularTransitonIndex) + inRangeOffset;
++inRangeOffset;
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
m_transitionPartition.setElementUnchecked(positionInFlattenedTransitions, positionInTransitionPartition, singularTransitonIndex);
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
++m_flattenedTransitionsSizePerNode[targetNodeIndex];
});
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
}
}
void markNode(unsigned nodeIndex)
{
m_nodePartition.markElementInCurrentGeneration(nodeIndex);
}
void refinePartitions()
{
m_nodePartition.refineGeneration([&](unsigned smallestSetIndex) {
m_nodePartition.iterateSet(smallestSetIndex, [&](unsigned nodeIndex) {
unsigned incomingTransitionsStartForNode = m_flattenedTransitionsStartOffsetPerNode[nodeIndex];
unsigned incomingTransitionsSizeForNode = m_flattenedTransitionsSizePerNode[nodeIndex];
for (unsigned i = 0; i < incomingTransitionsSizeForNode; ++i)
m_transitionPartition.markElementInCurrentGeneration(incomingTransitionsStartForNode + i);
});
// We only need to split the transitions, we handle the new sets through the main loop.
m_transitionPartition.refineGeneration([](unsigned) { });
});
}
void splitByUniqueTransitions()
{
for (; m_nextTransitionSetToProcess < m_transitionPartition.size(); ++m_nextTransitionSetToProcess) {
// We use the known splitters to refine the set.
m_transitionPartition.iterateSet(m_nextTransitionSetToProcess, [&](unsigned transitionIndex) {
unsigned sourceNodeIndex = m_flattenedTransitions[transitionIndex].source;
m_nodePartition.markElementInCurrentGeneration(sourceNodeIndex);
});
refinePartitions();
}
}
unsigned nodeReplacement(unsigned nodeIndex)
{
unsigned setIndex = m_nodePartition.setIndex(nodeIndex);
return m_nodePartition.firstElementInSet(setIndex);
}
private:
struct Transition {
unsigned source;
};
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
struct CounterConverter {
uint32_t convert(uint32_t)
{
return 1;
}
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
void extend(uint32_t& destination, uint32_t)
{
++destination;
}
};
Vector<unsigned, 0, ContentExtensionsOverflowHandler> m_flattenedTransitionsStartOffsetPerNode;
Vector<unsigned, 0, ContentExtensionsOverflowHandler> m_flattenedTransitionsSizePerNode;
Vector<Transition, 0, ContentExtensionsOverflowHandler> m_flattenedTransitions;
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
Partition m_nodePartition;
Partition m_transitionPartition;
unsigned m_nextTransitionSetToProcess { 0 };
};
struct ActionKey {
enum DeletedValueTag { DeletedValue };
explicit ActionKey(DeletedValueTag) { state = Deleted; }
enum EmptyValueTag { EmptyValue };
explicit ActionKey(EmptyValueTag) { state = Empty; }
Use less memory when compiling content extensions. https://bugs.webkit.org/show_bug.cgi?id=144051 Patch by Alex Christensen <achristensen@webkit.org> on 2015-04-23 Reviewed by Darin Adler and Benjamin Poulain. Source/WebCore: No change in functionality, correctness already covered by existing tests. Before this patch, a DFANode contained a HashSet of transitions. Large vectors of DFANodes made many small HashSets, which was inefficient use of memory. We now put all the actions and transitions into one big compact Vector and each node owns ranges in that vector. * contentextensions/CombinedURLFilters.cpp: (WebCore::ContentExtensions::recursiveMemoryUsed): (WebCore::ContentExtensions::CombinedURLFilters::memoryUsed): * contentextensions/CombinedURLFilters.h: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/ContentExtensionsDebugging.h: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): (WebCore::ContentExtensions::DFANode::changeFallbackTransition): (WebCore::ContentExtensions::DFANode::addFallbackTransition): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::DFA): Deleted. (WebCore::ContentExtensions::DFA::operator=): Deleted. * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): (WebCore::ContentExtensions::DFABytecodeCompiler::compile): * contentextensions/DFABytecodeCompiler.h: * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: * contentextensions/DFANode.h: (WebCore::ContentExtensions::DFANode::isKilled): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): (WebCore::ContentExtensions::DFANode::hasActions): (WebCore::ContentExtensions::DFANode::transitionsLength): (WebCore::ContentExtensions::DFANode::actionsLength): (WebCore::ContentExtensions::DFANode::actionsStart): (WebCore::ContentExtensions::DFANode::setActions): (WebCore::ContentExtensions::DFANode::setTransitions): (WebCore::ContentExtensions::DFANode::resetTransitions): (WebCore::ContentExtensions::DFANode::transitionsStart): (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): * contentextensions/NFA.cpp: (WebCore::ContentExtensions::NFA::memoryUsed): * contentextensions/NFA.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetSource::NodeIdSetToUniqueNodeIdSetSource): (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetTranslator::translate): (WebCore::ContentExtensions::getOrCreateDFANode): (WebCore::ContentExtensions::NFAToDFA::convert): Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: (TestWebKitAPI::TEST_F): * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: (TestWebKitAPI::countLiveNodes): Canonical link: https://commits.webkit.org/162080@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@183204 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-23 19:56:57 +00:00
explicit ActionKey(const DFA* dfa, uint32_t actionsStart, uint16_t actionsLength)
: dfa(dfa)
, actionsStart(actionsStart)
, actionsLength(actionsLength)
, state(Valid)
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
{
StringHasher hasher;
Use less memory when compiling content extensions. https://bugs.webkit.org/show_bug.cgi?id=144051 Patch by Alex Christensen <achristensen@webkit.org> on 2015-04-23 Reviewed by Darin Adler and Benjamin Poulain. Source/WebCore: No change in functionality, correctness already covered by existing tests. Before this patch, a DFANode contained a HashSet of transitions. Large vectors of DFANodes made many small HashSets, which was inefficient use of memory. We now put all the actions and transitions into one big compact Vector and each node owns ranges in that vector. * contentextensions/CombinedURLFilters.cpp: (WebCore::ContentExtensions::recursiveMemoryUsed): (WebCore::ContentExtensions::CombinedURLFilters::memoryUsed): * contentextensions/CombinedURLFilters.h: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/ContentExtensionsDebugging.h: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): (WebCore::ContentExtensions::DFANode::changeFallbackTransition): (WebCore::ContentExtensions::DFANode::addFallbackTransition): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::DFA): Deleted. (WebCore::ContentExtensions::DFA::operator=): Deleted. * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): (WebCore::ContentExtensions::DFABytecodeCompiler::compile): * contentextensions/DFABytecodeCompiler.h: * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: * contentextensions/DFANode.h: (WebCore::ContentExtensions::DFANode::isKilled): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): (WebCore::ContentExtensions::DFANode::hasActions): (WebCore::ContentExtensions::DFANode::transitionsLength): (WebCore::ContentExtensions::DFANode::actionsLength): (WebCore::ContentExtensions::DFANode::actionsStart): (WebCore::ContentExtensions::DFANode::setActions): (WebCore::ContentExtensions::DFANode::setTransitions): (WebCore::ContentExtensions::DFANode::resetTransitions): (WebCore::ContentExtensions::DFANode::transitionsStart): (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): * contentextensions/NFA.cpp: (WebCore::ContentExtensions::NFA::memoryUsed): * contentextensions/NFA.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetSource::NodeIdSetToUniqueNodeIdSetSource): (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetTranslator::translate): (WebCore::ContentExtensions::getOrCreateDFANode): (WebCore::ContentExtensions::NFAToDFA::convert): Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: (TestWebKitAPI::TEST_F): * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: (TestWebKitAPI::countLiveNodes): Canonical link: https://commits.webkit.org/162080@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@183204 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-23 19:56:57 +00:00
hasher.addCharactersAssumingAligned(reinterpret_cast<const UChar*>(&dfa->actions[actionsStart]), actionsLength * sizeof(uint64_t) / sizeof(UChar));
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
hash = hasher.hash();
}
bool isEmptyValue() const { return state == Empty; }
bool isDeletedValue() const { return state == Deleted; }
unsigned hash { 0 };
Use less memory when compiling content extensions. https://bugs.webkit.org/show_bug.cgi?id=144051 Patch by Alex Christensen <achristensen@webkit.org> on 2015-04-23 Reviewed by Darin Adler and Benjamin Poulain. Source/WebCore: No change in functionality, correctness already covered by existing tests. Before this patch, a DFANode contained a HashSet of transitions. Large vectors of DFANodes made many small HashSets, which was inefficient use of memory. We now put all the actions and transitions into one big compact Vector and each node owns ranges in that vector. * contentextensions/CombinedURLFilters.cpp: (WebCore::ContentExtensions::recursiveMemoryUsed): (WebCore::ContentExtensions::CombinedURLFilters::memoryUsed): * contentextensions/CombinedURLFilters.h: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/ContentExtensionsDebugging.h: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): (WebCore::ContentExtensions::DFANode::changeFallbackTransition): (WebCore::ContentExtensions::DFANode::addFallbackTransition): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::DFA): Deleted. (WebCore::ContentExtensions::DFA::operator=): Deleted. * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): (WebCore::ContentExtensions::DFABytecodeCompiler::compile): * contentextensions/DFABytecodeCompiler.h: * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: * contentextensions/DFANode.h: (WebCore::ContentExtensions::DFANode::isKilled): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): (WebCore::ContentExtensions::DFANode::hasActions): (WebCore::ContentExtensions::DFANode::transitionsLength): (WebCore::ContentExtensions::DFANode::actionsLength): (WebCore::ContentExtensions::DFANode::actionsStart): (WebCore::ContentExtensions::DFANode::setActions): (WebCore::ContentExtensions::DFANode::setTransitions): (WebCore::ContentExtensions::DFANode::resetTransitions): (WebCore::ContentExtensions::DFANode::transitionsStart): (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): * contentextensions/NFA.cpp: (WebCore::ContentExtensions::NFA::memoryUsed): * contentextensions/NFA.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetSource::NodeIdSetToUniqueNodeIdSetSource): (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetTranslator::translate): (WebCore::ContentExtensions::getOrCreateDFANode): (WebCore::ContentExtensions::NFAToDFA::convert): Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: (TestWebKitAPI::TEST_F): * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: (TestWebKitAPI::countLiveNodes): Canonical link: https://commits.webkit.org/162080@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@183204 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-23 19:56:57 +00:00
const DFA* dfa { nullptr };
uint32_t actionsStart { 0 };
uint16_t actionsLength { 0 };
Use less memory when compiling content extensions. https://bugs.webkit.org/show_bug.cgi?id=144051 Patch by Alex Christensen <achristensen@webkit.org> on 2015-04-23 Reviewed by Darin Adler and Benjamin Poulain. Source/WebCore: No change in functionality, correctness already covered by existing tests. Before this patch, a DFANode contained a HashSet of transitions. Large vectors of DFANodes made many small HashSets, which was inefficient use of memory. We now put all the actions and transitions into one big compact Vector and each node owns ranges in that vector. * contentextensions/CombinedURLFilters.cpp: (WebCore::ContentExtensions::recursiveMemoryUsed): (WebCore::ContentExtensions::CombinedURLFilters::memoryUsed): * contentextensions/CombinedURLFilters.h: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/ContentExtensionsDebugging.h: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): (WebCore::ContentExtensions::DFANode::changeFallbackTransition): (WebCore::ContentExtensions::DFANode::addFallbackTransition): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::DFA): Deleted. (WebCore::ContentExtensions::DFA::operator=): Deleted. * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): (WebCore::ContentExtensions::DFABytecodeCompiler::compile): * contentextensions/DFABytecodeCompiler.h: * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: * contentextensions/DFANode.h: (WebCore::ContentExtensions::DFANode::isKilled): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): (WebCore::ContentExtensions::DFANode::hasActions): (WebCore::ContentExtensions::DFANode::transitionsLength): (WebCore::ContentExtensions::DFANode::actionsLength): (WebCore::ContentExtensions::DFANode::actionsStart): (WebCore::ContentExtensions::DFANode::setActions): (WebCore::ContentExtensions::DFANode::setTransitions): (WebCore::ContentExtensions::DFANode::resetTransitions): (WebCore::ContentExtensions::DFANode::transitionsStart): (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): * contentextensions/NFA.cpp: (WebCore::ContentExtensions::NFA::memoryUsed): * contentextensions/NFA.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetSource::NodeIdSetToUniqueNodeIdSetSource): (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetTranslator::translate): (WebCore::ContentExtensions::getOrCreateDFANode): (WebCore::ContentExtensions::NFAToDFA::convert): Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: (TestWebKitAPI::TEST_F): * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: (TestWebKitAPI::countLiveNodes): Canonical link: https://commits.webkit.org/162080@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@183204 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-23 19:56:57 +00:00
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
enum {
Empty,
Valid,
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
Deleted
} state { Empty };
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
};
struct ActionKeyHash {
static unsigned hash(const ActionKey& actionKey)
{
return actionKey.hash;
}
static bool equal(const ActionKey& a, const ActionKey& b)
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
{
Use less memory when compiling content extensions. https://bugs.webkit.org/show_bug.cgi?id=144051 Patch by Alex Christensen <achristensen@webkit.org> on 2015-04-23 Reviewed by Darin Adler and Benjamin Poulain. Source/WebCore: No change in functionality, correctness already covered by existing tests. Before this patch, a DFANode contained a HashSet of transitions. Large vectors of DFANodes made many small HashSets, which was inefficient use of memory. We now put all the actions and transitions into one big compact Vector and each node owns ranges in that vector. * contentextensions/CombinedURLFilters.cpp: (WebCore::ContentExtensions::recursiveMemoryUsed): (WebCore::ContentExtensions::CombinedURLFilters::memoryUsed): * contentextensions/CombinedURLFilters.h: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/ContentExtensionsDebugging.h: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): (WebCore::ContentExtensions::DFANode::changeFallbackTransition): (WebCore::ContentExtensions::DFANode::addFallbackTransition): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::DFA): Deleted. (WebCore::ContentExtensions::DFA::operator=): Deleted. * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): (WebCore::ContentExtensions::DFABytecodeCompiler::compile): * contentextensions/DFABytecodeCompiler.h: * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: * contentextensions/DFANode.h: (WebCore::ContentExtensions::DFANode::isKilled): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): (WebCore::ContentExtensions::DFANode::hasActions): (WebCore::ContentExtensions::DFANode::transitionsLength): (WebCore::ContentExtensions::DFANode::actionsLength): (WebCore::ContentExtensions::DFANode::actionsStart): (WebCore::ContentExtensions::DFANode::setActions): (WebCore::ContentExtensions::DFANode::setTransitions): (WebCore::ContentExtensions::DFANode::resetTransitions): (WebCore::ContentExtensions::DFANode::transitionsStart): (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): * contentextensions/NFA.cpp: (WebCore::ContentExtensions::NFA::memoryUsed): * contentextensions/NFA.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetSource::NodeIdSetToUniqueNodeIdSetSource): (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetTranslator::translate): (WebCore::ContentExtensions::getOrCreateDFANode): (WebCore::ContentExtensions::NFAToDFA::convert): Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: (TestWebKitAPI::TEST_F): * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: (TestWebKitAPI::countLiveNodes): Canonical link: https://commits.webkit.org/162080@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@183204 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-23 19:56:57 +00:00
if (a.state != b.state
|| a.dfa != b.dfa
|| a.actionsLength != b.actionsLength)
return false;
for (uint16_t i = 0; i < a.actionsLength; ++i) {
if (a.dfa->actions[a.actionsStart + i] != a.dfa->actions[b.actionsStart + i])
return false;
}
return true;
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
}
static const bool safeToCompareToEmptyOrDeleted = false;
};
struct ActionKeyHashTraits : public WTF::CustomHashTraits<ActionKey> {
static const bool emptyValueIsZero = true;
};
} // anonymous namespace.
Use less memory when compiling content extensions. https://bugs.webkit.org/show_bug.cgi?id=144051 Patch by Alex Christensen <achristensen@webkit.org> on 2015-04-23 Reviewed by Darin Adler and Benjamin Poulain. Source/WebCore: No change in functionality, correctness already covered by existing tests. Before this patch, a DFANode contained a HashSet of transitions. Large vectors of DFANodes made many small HashSets, which was inefficient use of memory. We now put all the actions and transitions into one big compact Vector and each node owns ranges in that vector. * contentextensions/CombinedURLFilters.cpp: (WebCore::ContentExtensions::recursiveMemoryUsed): (WebCore::ContentExtensions::CombinedURLFilters::memoryUsed): * contentextensions/CombinedURLFilters.h: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/ContentExtensionsDebugging.h: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): (WebCore::ContentExtensions::DFANode::changeFallbackTransition): (WebCore::ContentExtensions::DFANode::addFallbackTransition): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::DFA): Deleted. (WebCore::ContentExtensions::DFA::operator=): Deleted. * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): (WebCore::ContentExtensions::DFABytecodeCompiler::compile): * contentextensions/DFABytecodeCompiler.h: * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: * contentextensions/DFANode.h: (WebCore::ContentExtensions::DFANode::isKilled): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): (WebCore::ContentExtensions::DFANode::hasActions): (WebCore::ContentExtensions::DFANode::transitionsLength): (WebCore::ContentExtensions::DFANode::actionsLength): (WebCore::ContentExtensions::DFANode::actionsStart): (WebCore::ContentExtensions::DFANode::setActions): (WebCore::ContentExtensions::DFANode::setTransitions): (WebCore::ContentExtensions::DFANode::resetTransitions): (WebCore::ContentExtensions::DFANode::transitionsStart): (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): * contentextensions/NFA.cpp: (WebCore::ContentExtensions::NFA::memoryUsed): * contentextensions/NFA.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetSource::NodeIdSetToUniqueNodeIdSetSource): (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetTranslator::translate): (WebCore::ContentExtensions::getOrCreateDFANode): (WebCore::ContentExtensions::NFAToDFA::convert): Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: (TestWebKitAPI::TEST_F): * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: (TestWebKitAPI::countLiveNodes): Canonical link: https://commits.webkit.org/162080@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@183204 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-23 19:56:57 +00:00
void DFAMinimizer::minimize(DFA& dfa)
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
{
Use less memory when compiling content extensions. https://bugs.webkit.org/show_bug.cgi?id=144051 Patch by Alex Christensen <achristensen@webkit.org> on 2015-04-23 Reviewed by Darin Adler and Benjamin Poulain. Source/WebCore: No change in functionality, correctness already covered by existing tests. Before this patch, a DFANode contained a HashSet of transitions. Large vectors of DFANodes made many small HashSets, which was inefficient use of memory. We now put all the actions and transitions into one big compact Vector and each node owns ranges in that vector. * contentextensions/CombinedURLFilters.cpp: (WebCore::ContentExtensions::recursiveMemoryUsed): (WebCore::ContentExtensions::CombinedURLFilters::memoryUsed): * contentextensions/CombinedURLFilters.h: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/ContentExtensionsDebugging.h: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): (WebCore::ContentExtensions::DFANode::changeFallbackTransition): (WebCore::ContentExtensions::DFANode::addFallbackTransition): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::DFA): Deleted. (WebCore::ContentExtensions::DFA::operator=): Deleted. * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): (WebCore::ContentExtensions::DFABytecodeCompiler::compile): * contentextensions/DFABytecodeCompiler.h: * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: * contentextensions/DFANode.h: (WebCore::ContentExtensions::DFANode::isKilled): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): (WebCore::ContentExtensions::DFANode::hasActions): (WebCore::ContentExtensions::DFANode::transitionsLength): (WebCore::ContentExtensions::DFANode::actionsLength): (WebCore::ContentExtensions::DFANode::actionsStart): (WebCore::ContentExtensions::DFANode::setActions): (WebCore::ContentExtensions::DFANode::setTransitions): (WebCore::ContentExtensions::DFANode::resetTransitions): (WebCore::ContentExtensions::DFANode::transitionsStart): (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): * contentextensions/NFA.cpp: (WebCore::ContentExtensions::NFA::memoryUsed): * contentextensions/NFA.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetSource::NodeIdSetToUniqueNodeIdSetSource): (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetTranslator::translate): (WebCore::ContentExtensions::getOrCreateDFANode): (WebCore::ContentExtensions::NFAToDFA::convert): Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: (TestWebKitAPI::TEST_F): * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: (TestWebKitAPI::countLiveNodes): Canonical link: https://commits.webkit.org/162080@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@183204 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-23 19:56:57 +00:00
FullGraphPartition fullGraphPartition(dfa);
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
// Unlike traditional minimization final states can be differentiated by their action.
// Instead of creating a single set for the final state, we partition by actions from
// the start.
HashMap<ActionKey, Vector<unsigned>, ActionKeyHash, ActionKeyHashTraits> finalStates;
Use less memory when compiling content extensions. https://bugs.webkit.org/show_bug.cgi?id=144051 Patch by Alex Christensen <achristensen@webkit.org> on 2015-04-23 Reviewed by Darin Adler and Benjamin Poulain. Source/WebCore: No change in functionality, correctness already covered by existing tests. Before this patch, a DFANode contained a HashSet of transitions. Large vectors of DFANodes made many small HashSets, which was inefficient use of memory. We now put all the actions and transitions into one big compact Vector and each node owns ranges in that vector. * contentextensions/CombinedURLFilters.cpp: (WebCore::ContentExtensions::recursiveMemoryUsed): (WebCore::ContentExtensions::CombinedURLFilters::memoryUsed): * contentextensions/CombinedURLFilters.h: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/ContentExtensionsDebugging.h: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): (WebCore::ContentExtensions::DFANode::changeFallbackTransition): (WebCore::ContentExtensions::DFANode::addFallbackTransition): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::DFA): Deleted. (WebCore::ContentExtensions::DFA::operator=): Deleted. * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): (WebCore::ContentExtensions::DFABytecodeCompiler::compile): * contentextensions/DFABytecodeCompiler.h: * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: * contentextensions/DFANode.h: (WebCore::ContentExtensions::DFANode::isKilled): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): (WebCore::ContentExtensions::DFANode::hasActions): (WebCore::ContentExtensions::DFANode::transitionsLength): (WebCore::ContentExtensions::DFANode::actionsLength): (WebCore::ContentExtensions::DFANode::actionsStart): (WebCore::ContentExtensions::DFANode::setActions): (WebCore::ContentExtensions::DFANode::setTransitions): (WebCore::ContentExtensions::DFANode::resetTransitions): (WebCore::ContentExtensions::DFANode::transitionsStart): (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): * contentextensions/NFA.cpp: (WebCore::ContentExtensions::NFA::memoryUsed): * contentextensions/NFA.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetSource::NodeIdSetToUniqueNodeIdSetSource): (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetTranslator::translate): (WebCore::ContentExtensions::getOrCreateDFANode): (WebCore::ContentExtensions::NFAToDFA::convert): Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: (TestWebKitAPI::TEST_F): * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: (TestWebKitAPI::countLiveNodes): Canonical link: https://commits.webkit.org/162080@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@183204 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-23 19:56:57 +00:00
for (unsigned i = 0; i < dfa.nodes.size(); ++i) {
const DFANode& node = dfa.nodes[i];
if (node.hasActions()) {
// FIXME: Sort the actions in the dfa to make nodes that have the same actions in different order equal.
auto addResult = finalStates.add(ActionKey(&dfa, node.actionsStart(), node.actionsLength()), Vector<unsigned>());
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
addResult.iterator->value.append(i);
}
}
for (const auto& slot : finalStates) {
for (unsigned finalStateIndex : slot.value)
fullGraphPartition.markNode(finalStateIndex);
fullGraphPartition.refinePartitions();
}
// Use every splitter to refine the node partitions.
fullGraphPartition.splitByUniqueTransitions();
Vector<unsigned> relocationVector;
Use less memory when compiling content extensions. https://bugs.webkit.org/show_bug.cgi?id=144051 Patch by Alex Christensen <achristensen@webkit.org> on 2015-04-23 Reviewed by Darin Adler and Benjamin Poulain. Source/WebCore: No change in functionality, correctness already covered by existing tests. Before this patch, a DFANode contained a HashSet of transitions. Large vectors of DFANodes made many small HashSets, which was inefficient use of memory. We now put all the actions and transitions into one big compact Vector and each node owns ranges in that vector. * contentextensions/CombinedURLFilters.cpp: (WebCore::ContentExtensions::recursiveMemoryUsed): (WebCore::ContentExtensions::CombinedURLFilters::memoryUsed): * contentextensions/CombinedURLFilters.h: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/ContentExtensionsDebugging.h: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): (WebCore::ContentExtensions::DFANode::changeFallbackTransition): (WebCore::ContentExtensions::DFANode::addFallbackTransition): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::DFA): Deleted. (WebCore::ContentExtensions::DFA::operator=): Deleted. * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): (WebCore::ContentExtensions::DFABytecodeCompiler::compile): * contentextensions/DFABytecodeCompiler.h: * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: * contentextensions/DFANode.h: (WebCore::ContentExtensions::DFANode::isKilled): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): (WebCore::ContentExtensions::DFANode::hasActions): (WebCore::ContentExtensions::DFANode::transitionsLength): (WebCore::ContentExtensions::DFANode::actionsLength): (WebCore::ContentExtensions::DFANode::actionsStart): (WebCore::ContentExtensions::DFANode::setActions): (WebCore::ContentExtensions::DFANode::setTransitions): (WebCore::ContentExtensions::DFANode::resetTransitions): (WebCore::ContentExtensions::DFANode::transitionsStart): (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): * contentextensions/NFA.cpp: (WebCore::ContentExtensions::NFA::memoryUsed): * contentextensions/NFA.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetSource::NodeIdSetToUniqueNodeIdSetSource): (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetTranslator::translate): (WebCore::ContentExtensions::getOrCreateDFANode): (WebCore::ContentExtensions::NFAToDFA::convert): Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: (TestWebKitAPI::TEST_F): * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: (TestWebKitAPI::countLiveNodes): Canonical link: https://commits.webkit.org/162080@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@183204 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-23 19:56:57 +00:00
relocationVector.reserveInitialCapacity(dfa.nodes.size());
for (unsigned i = 0; i < dfa.nodes.size(); ++i)
relocationVector.uncheckedAppend(i);
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
// Update all the transitions.
Use less memory when compiling content extensions. https://bugs.webkit.org/show_bug.cgi?id=144051 Patch by Alex Christensen <achristensen@webkit.org> on 2015-04-23 Reviewed by Darin Adler and Benjamin Poulain. Source/WebCore: No change in functionality, correctness already covered by existing tests. Before this patch, a DFANode contained a HashSet of transitions. Large vectors of DFANodes made many small HashSets, which was inefficient use of memory. We now put all the actions and transitions into one big compact Vector and each node owns ranges in that vector. * contentextensions/CombinedURLFilters.cpp: (WebCore::ContentExtensions::recursiveMemoryUsed): (WebCore::ContentExtensions::CombinedURLFilters::memoryUsed): * contentextensions/CombinedURLFilters.h: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/ContentExtensionsDebugging.h: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): (WebCore::ContentExtensions::DFANode::changeFallbackTransition): (WebCore::ContentExtensions::DFANode::addFallbackTransition): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::DFA): Deleted. (WebCore::ContentExtensions::DFA::operator=): Deleted. * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): (WebCore::ContentExtensions::DFABytecodeCompiler::compile): * contentextensions/DFABytecodeCompiler.h: * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: * contentextensions/DFANode.h: (WebCore::ContentExtensions::DFANode::isKilled): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): (WebCore::ContentExtensions::DFANode::hasActions): (WebCore::ContentExtensions::DFANode::transitionsLength): (WebCore::ContentExtensions::DFANode::actionsLength): (WebCore::ContentExtensions::DFANode::actionsStart): (WebCore::ContentExtensions::DFANode::setActions): (WebCore::ContentExtensions::DFANode::setTransitions): (WebCore::ContentExtensions::DFANode::resetTransitions): (WebCore::ContentExtensions::DFANode::transitionsStart): (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): * contentextensions/NFA.cpp: (WebCore::ContentExtensions::NFA::memoryUsed): * contentextensions/NFA.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetSource::NodeIdSetToUniqueNodeIdSetSource): (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetTranslator::translate): (WebCore::ContentExtensions::getOrCreateDFANode): (WebCore::ContentExtensions::NFAToDFA::convert): Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: (TestWebKitAPI::TEST_F): * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: (TestWebKitAPI::countLiveNodes): Canonical link: https://commits.webkit.org/162080@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@183204 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-23 19:56:57 +00:00
for (unsigned i = 0; i < dfa.nodes.size(); ++i) {
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
unsigned replacement = fullGraphPartition.nodeReplacement(i);
if (i != replacement) {
relocationVector[i] = replacement;
Use less memory when compiling content extensions. https://bugs.webkit.org/show_bug.cgi?id=144051 Patch by Alex Christensen <achristensen@webkit.org> on 2015-04-23 Reviewed by Darin Adler and Benjamin Poulain. Source/WebCore: No change in functionality, correctness already covered by existing tests. Before this patch, a DFANode contained a HashSet of transitions. Large vectors of DFANodes made many small HashSets, which was inefficient use of memory. We now put all the actions and transitions into one big compact Vector and each node owns ranges in that vector. * contentextensions/CombinedURLFilters.cpp: (WebCore::ContentExtensions::recursiveMemoryUsed): (WebCore::ContentExtensions::CombinedURLFilters::memoryUsed): * contentextensions/CombinedURLFilters.h: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/ContentExtensionsDebugging.h: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): (WebCore::ContentExtensions::DFANode::changeFallbackTransition): (WebCore::ContentExtensions::DFANode::addFallbackTransition): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::DFA): Deleted. (WebCore::ContentExtensions::DFA::operator=): Deleted. * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): (WebCore::ContentExtensions::DFABytecodeCompiler::compile): * contentextensions/DFABytecodeCompiler.h: * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: * contentextensions/DFANode.h: (WebCore::ContentExtensions::DFANode::isKilled): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): (WebCore::ContentExtensions::DFANode::hasActions): (WebCore::ContentExtensions::DFANode::transitionsLength): (WebCore::ContentExtensions::DFANode::actionsLength): (WebCore::ContentExtensions::DFANode::actionsStart): (WebCore::ContentExtensions::DFANode::setActions): (WebCore::ContentExtensions::DFANode::setTransitions): (WebCore::ContentExtensions::DFANode::resetTransitions): (WebCore::ContentExtensions::DFANode::transitionsStart): (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): * contentextensions/NFA.cpp: (WebCore::ContentExtensions::NFA::memoryUsed): * contentextensions/NFA.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetSource::NodeIdSetToUniqueNodeIdSetSource): (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetTranslator::translate): (WebCore::ContentExtensions::getOrCreateDFANode): (WebCore::ContentExtensions::NFAToDFA::convert): Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: (TestWebKitAPI::TEST_F): * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: (TestWebKitAPI::countLiveNodes): Canonical link: https://commits.webkit.org/162080@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@183204 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-23 19:56:57 +00:00
dfa.nodes[i].kill(dfa);
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
}
}
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
dfa.root = relocationVector[dfa.root];
Use less memory when compiling content extensions. https://bugs.webkit.org/show_bug.cgi?id=144051 Patch by Alex Christensen <achristensen@webkit.org> on 2015-04-23 Reviewed by Darin Adler and Benjamin Poulain. Source/WebCore: No change in functionality, correctness already covered by existing tests. Before this patch, a DFANode contained a HashSet of transitions. Large vectors of DFANodes made many small HashSets, which was inefficient use of memory. We now put all the actions and transitions into one big compact Vector and each node owns ranges in that vector. * contentextensions/CombinedURLFilters.cpp: (WebCore::ContentExtensions::recursiveMemoryUsed): (WebCore::ContentExtensions::CombinedURLFilters::memoryUsed): * contentextensions/CombinedURLFilters.h: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/ContentExtensionsDebugging.h: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): (WebCore::ContentExtensions::DFANode::changeFallbackTransition): (WebCore::ContentExtensions::DFANode::addFallbackTransition): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::DFA): Deleted. (WebCore::ContentExtensions::DFA::operator=): Deleted. * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): (WebCore::ContentExtensions::DFABytecodeCompiler::compile): * contentextensions/DFABytecodeCompiler.h: * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: * contentextensions/DFANode.h: (WebCore::ContentExtensions::DFANode::isKilled): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): (WebCore::ContentExtensions::DFANode::hasActions): (WebCore::ContentExtensions::DFANode::transitionsLength): (WebCore::ContentExtensions::DFANode::actionsLength): (WebCore::ContentExtensions::DFANode::actionsStart): (WebCore::ContentExtensions::DFANode::setActions): (WebCore::ContentExtensions::DFANode::setTransitions): (WebCore::ContentExtensions::DFANode::resetTransitions): (WebCore::ContentExtensions::DFANode::transitionsStart): (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): * contentextensions/NFA.cpp: (WebCore::ContentExtensions::NFA::memoryUsed): * contentextensions/NFA.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetSource::NodeIdSetToUniqueNodeIdSetSource): (WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetTranslator::translate): (WebCore::ContentExtensions::getOrCreateDFANode): (WebCore::ContentExtensions::NFAToDFA::convert): Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: (TestWebKitAPI::TEST_F): * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: (TestWebKitAPI::countLiveNodes): Canonical link: https://commits.webkit.org/162080@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@183204 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-23 19:56:57 +00:00
for (DFANode& node : dfa.nodes) {
[Content Extensions] Make the DFA transitions ranges instead of characters https://bugs.webkit.org/show_bug.cgi?id=146575 Patch by Benjamin Poulain <benjamin@webkit.org> on 2015-07-06 Reviewed by Alex Christensen. Source/WebCore: This patch changes the DFA and code using the DFA to use ranges to represent the transitions between any two nodes. This patch builds on top of the tools introduced in r186079. The DFA structure is basically the same as ImmutableNFA but without any epsilon transitions. This patch introduces a transition iterator to make the DFA compatible with the existing algorithms. --- The DFA combiner is rebuilt on top of MutableRangeList. Combining the transitions of two nodes is one by merging the range list of each not into a common MutableRangeList. The data converter takes care of creating the signature of the combination. The code got simpler since MutableRangeList does most of the work now. It is also much faster. --- The minimizer is more intersting. With the current algorithm, we cannot resolve overlaps between ranges. On the other hand, the minimizer does not care about the symbol of the transitions if we are careful to partition transitions of the same symbol together. What I did was to turn the minimizer into a pure transition based one, BUT each "symbol" is actually an unbreakable range. The first step is to go over all the transitions of all the nodes and find the largest ranges such that the alphabet of interest is covered but there is not a single intersection between any two nodes (what I called "singular transitions" in the code). This can be done efficiently with MutableRangeList. A little trick there is that I also used the converter to count how many real transition overlaps any singular transition. Those singular transitions become the alphabet of our minimizer. The "symbol" of our alphabet is simply the position of the singular transition in the list. The partition of transition is created by populating each set with all the transition that overlaps the symbols. Note that since the partition is created on the fly, the Transition structure used for repartitioning only contains the source of the transitions. Once our transition parition has been carefuly created, we can completely forget about the symbols and only work with subsets. Since the singular transitions have no overlap (unlike fallback transitions), this new minimizer will find the minimial solution for well formed input. * WebCore.xcodeproj/project.pbxproj: * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::memoryUsed): (WebCore::ContentExtensions::printTransitions): (WebCore::ContentExtensions::DFANode::actions): Deleted. (WebCore::ContentExtensions::DFANode::transitions): Deleted. (WebCore::ContentExtensions::DFANode::fallbackTransitionDestination): Deleted. (WebCore::ContentExtensions::DFANode::changeFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::addFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::containsTransition): Deleted. (WebCore::ContentExtensions::DFANode::kill): Deleted. (WebCore::ContentExtensions::DFA::debugPrintDot): Deleted. * contentextensions/DFA.h: (WebCore::ContentExtensions::DFANode::ConstRangeIterator::range): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::range): (WebCore::ContentExtensions::DFANode::RangeIterator::target): (WebCore::ContentExtensions::DFANode::RangeIterator::resetTarget): * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::ranges): (WebCore::ContentExtensions::DFABytecodeCompiler::nodeTransitionsMaxBytecodeSize): (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions): * contentextensions/DFACombiner.cpp: (WebCore::ContentExtensions::DFAMerger::TargetConverter::convert): (WebCore::ContentExtensions::DFAMerger::TargetConverter::extend): (WebCore::ContentExtensions::DFAMerger::TargetConverter::setHalfSignature): (WebCore::ContentExtensions::DFAMerger::merge): (WebCore::ContentExtensions::DFAMerger::getOrCreateCombinedNode): (WebCore::ContentExtensions::DFAMerger::setHalfSignature): Deleted. (WebCore::ContentExtensions::DFAMerger::populateTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::populateFromFallbackTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createTransitions): Deleted. (WebCore::ContentExtensions::DFAMerger::createFallbackTransitionIfNeeded): Deleted. * contentextensions/DFAMinimizer.cpp: (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFANode.cpp: Added. (WebCore::ContentExtensions::DFANode::actions): (WebCore::ContentExtensions::DFANode::containsTransition): (WebCore::ContentExtensions::DFANode::kill): (WebCore::ContentExtensions::DFANode::canUseFallbackTransition): (WebCore::ContentExtensions::DFANode::bestFallbackTarget): * contentextensions/DFANode.h: (WebCore::ContentExtensions::CharRange::size): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator*): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator==): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::operator++): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::first): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::last): (WebCore::ContentExtensions::DFANode::ConstRangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableConstRange::begin): (WebCore::ContentExtensions::DFANode::IterableConstRange::end): (WebCore::ContentExtensions::DFANode::transitions): (WebCore::ContentExtensions::DFANode::RangeIterator::operator*): (WebCore::ContentExtensions::DFANode::RangeIterator::operator==): (WebCore::ContentExtensions::DFANode::RangeIterator::operator!=): (WebCore::ContentExtensions::DFANode::RangeIterator::operator++): (WebCore::ContentExtensions::DFANode::RangeIterator::first): (WebCore::ContentExtensions::DFANode::RangeIterator::last): (WebCore::ContentExtensions::DFANode::RangeIterator::data): (WebCore::ContentExtensions::DFANode::IterableRange::begin): (WebCore::ContentExtensions::DFANode::IterableRange::end): (WebCore::ContentExtensions::DFANode::hasFallbackTransition): Deleted. (WebCore::ContentExtensions::DFANode::transitionsLength): Deleted. (WebCore::ContentExtensions::DFANode::transitionsStart): Deleted. (WebCore::ContentExtensions::DFANode::resetTransitions): Deleted. (WebCore::ContentExtensions::DFANode::setHasFallbackTransitionWithoutChangingDFA): Deleted. * contentextensions/ImmutableNFA.h: (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::range): (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->): Deleted. * contentextensions/ImmutableNFANodeBuilder.h: (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::first): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::last): (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator*): Deleted. (WebCore::ContentExtensions::ImmutableNFANodeBuilder::FakeRangeIterator::operator->): Deleted. * contentextensions/MutableRange.h: (WebCore::ContentExtensions::MutableRange::size): Deleted. * contentextensions/MutableRangeList.h: (WebCore::ContentExtensions::MutableRangeList::ConstIterator::first): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::last): (WebCore::ContentExtensions::MutableRangeList::ConstIterator::data): (WebCore::ContentExtensions::MutableRangeList::extend): (WebCore::ContentExtensions::MutableRangeList::size): (WebCore::ContentExtensions::MutableRangeList::initializeFrom): * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::canUseFallbackTransition): Deleted. (WebCore::ContentExtensions::findBestFallbackTarget): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Since the minimizer is perfect, we get the minimal solution now, which is really cool! Canonical link: https://commits.webkit.org/164740@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-07-06 21:06:30 +00:00
if (node.isKilled())
continue;
for (auto& transition : node.transitions(dfa)) {
uint32_t target = transition.target();
uint32_t relocatedTarget = relocationVector[target];
if (target != relocatedTarget)
transition.resetTarget(relocatedTarget);
}
Add a conservative DFA minimizer for the content extension matcher https://bugs.webkit.org/show_bug.cgi?id=143501 Reviewed by Alex Christensen. Source/WebCore: This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer: some indistinguishable are not merged, but no two distinguishable are merged. The general idea of the algorithm is to put all the state into a single set and partition iteratively until it is impossible to split any subset by using a transition to distinguish two states. Let's ignore fallback transition for now, and I'll explain later how they fit in the big picture. The first thing we do is create a partition of the transition by grouping every transition by the same character in the same subset. This partition of transitions is the base by which we will partition the states. Each subset in the transition partition is a "distinguisher" by which we can separate the state partition. We also create a second partition, the state partition. This is where we keep all the subsets of states that have been split so far. Let say we have the following graph. 1 --a--> 2 1 --b--> 3 2 --c--> 4 (final) 3 --c--> 4 (final) The partition of transition would start with: Set 0: 1 --a--> 2 Set 1: 1 --b--> 3 Set 2: 2 --c--> 4 3 --c--> 4 The state partition would have a single set with { 1, 2, 3, 4 }. Next, we split the state partition by distinguishable final states. In this case, we would split it into { 1, 2, 3 }, { 4 }. We then refine the transition partition by splitting it by the states that have been distinguished. Here, the only transitions to 4 are both is the same set (set 2), so the transition partition remains the same. We can now execute the main loop of the algorithm: 1) Split the states by the transitions. 2) Split the transitions that are now reaching two different sets of the state partition. 3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher" to process. In this case, we just iterate over the partition set in order, and add newly split transitions to the end of the list. In the example, we would first visit set 0. We have that state 1 is distinguishable by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }. We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do. Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same set -> nothing to do. There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3. --- Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative approach: we split everything assuming fallback transition do not exist, then we refine by the fallback transitions. Let's take the following example: 1 --a--> 3 2 --a--> 3 1 -[f]-> 4 2 -[f]-> 5 and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }. The states 1 and 2 are together because they cannot be distinguished by 'a', but the fallback transition distinguishes them. Since we have done every other split, we have one useful property: we know that every state in every set transition with the exact set of characters within that set. If that was not true, there would be one "distinguisher" 'x' that could spit the set into two subsets: the one with the transition 'x' and the ones without. Since all the transitions are the same, there is no overlap between the defined transition and the fallback transition. Consequently, we can use the fallback transition as a whole transition and use it to distinguish the states. The fallback transitions are handled like any other transition, we have a partition of such transitions and split by each of them. BUT, we can only use them after every unique transition has been covered. This trick is also what makes the minimization imperfect: it should be possible to merge states with overlap in their fallback transitions but we would split them. --- Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect work on this patch. Thanks for your wonderful papers about DFA minimization. * WebCore.xcodeproj/project.pbxproj: * contentextensions/ContentExtensionCompiler.cpp: (WebCore::ContentExtensions::compileRuleList): * contentextensions/DFA.cpp: (WebCore::ContentExtensions::DFA::minimize): (WebCore::ContentExtensions::DFA::debugPrintDot): * contentextensions/DFA.h: * contentextensions/DFABytecodeCompiler.cpp: (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode): * contentextensions/DFAMinimizer.cpp: Added. (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions): (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize): (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration): (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex): (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet): (WebCore::ContentExtensions::DFAMinimizer::Partition::size): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements): (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions): (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash): (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal): (WebCore::ContentExtensions::DFAMinimizer::minimize): * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h. * contentextensions/DFANode.h: * contentextensions/NFAToDFA.cpp: (WebCore::ContentExtensions::NFAToDFA::convert): (WebCore::ContentExtensions::simplifyTransitions): Deleted. Tools: * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp: * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp: Canonical link: https://commits.webkit.org/161756@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
}
}
} // namespace ContentExtensions
} // namespace WebCore
#endif // ENABLE(CONTENT_EXTENSIONS)