Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
/*
* Copyright ( C ) 2015 Apple Inc . All rights reserved .
*
* Redistribution and use in source and binary forms , with or without
* modification , are permitted provided that the following conditions
* are met :
* 1. Redistributions of source code must retain the above copyright
* notice , this list of conditions and the following disclaimer .
* 2. Redistributions in binary form must reproduce the above copyright
* notice , this list of conditions and the following disclaimer in the
* documentation and / or other materials provided with the distribution .
*
* THIS SOFTWARE IS PROVIDED BY APPLE INC . AND ITS CONTRIBUTORS ` ` AS IS ' '
* AND ANY EXPRESS OR IMPLIED WARRANTIES , INCLUDING , BUT NOT LIMITED TO ,
* THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED . IN NO EVENT SHALL APPLE INC . OR ITS CONTRIBUTORS
* BE LIABLE FOR ANY DIRECT , INDIRECT , INCIDENTAL , SPECIAL , EXEMPLARY , OR
* CONSEQUENTIAL DAMAGES ( INCLUDING , BUT NOT LIMITED TO , PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES ; LOSS OF USE , DATA , OR PROFITS ; OR BUSINESS
* INTERRUPTION ) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY , WHETHER IN
* CONTRACT , STRICT LIABILITY , OR TORT ( INCLUDING NEGLIGENCE OR OTHERWISE )
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE , EVEN IF ADVISED OF
* THE POSSIBILITY OF SUCH DAMAGE .
*/
# include "config.h"
# include "DFAMinimizer.h"
# if ENABLE(CONTENT_EXTENSIONS)
2015-04-23 19:56:57 +00:00
# include "DFA.h"
# include "DFANode.h"
2015-07-06 21:06:30 +00:00
# include "MutableRangeList.h"
2015-04-23 19:56:57 +00:00
# include <wtf/HashMap.h>
2015-08-17 23:59:05 +00:00
# include <wtf/Hasher.h>
2015-04-23 19:56:57 +00:00
# include <wtf/Vector.h>
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
namespace WebCore {
namespace ContentExtensions {
namespace {
2015-07-06 21:06:30 +00:00
template < typename VectorType , typename Iterable , typename Function >
static inline void iterateIntersections ( const VectorType & singularTransitionsFirsts , const Iterable & iterableTransitionList , const Function & intersectionHandler )
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
{
2015-07-06 21:06:30 +00:00
ASSERT ( ! singularTransitionsFirsts . isEmpty ( ) ) ;
auto otherIterator = iterableTransitionList . begin ( ) ;
auto otherEnd = iterableTransitionList . end ( ) ;
if ( otherIterator = = otherEnd )
return ;
unsigned singularTransitionsLength = singularTransitionsFirsts . size ( ) ;
unsigned singularTransitionsFirstsIndex = 0 ;
for ( ; otherIterator ! = otherEnd ; + + otherIterator ) {
auto firstCharacter = otherIterator . first ( ) ;
while ( singularTransitionsFirstsIndex < singularTransitionsLength
& & singularTransitionsFirsts [ singularTransitionsFirstsIndex ] ! = firstCharacter )
+ + singularTransitionsFirstsIndex ;
intersectionHandler ( singularTransitionsFirstsIndex , otherIterator ) ;
+ + singularTransitionsFirstsIndex ;
auto lastCharacter = otherIterator . last ( ) ;
while ( singularTransitionsFirstsIndex < singularTransitionsLength
& & singularTransitionsFirsts [ singularTransitionsFirstsIndex ] < = lastCharacter ) {
intersectionHandler ( singularTransitionsFirstsIndex , otherIterator ) ;
+ + singularTransitionsFirstsIndex ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
}
}
}
class Partition {
public :
void initialize ( unsigned size )
{
if ( ! size )
return ;
m_sets . reserveInitialCapacity ( size ) ;
m_partitionedElements . resize ( size ) ;
m_elementPositionInPartitionedNodes . resize ( size ) ;
m_elementToSetMap . resize ( size ) ;
for ( unsigned i = 0 ; i < size ; + + i ) {
m_partitionedElements [ i ] = i ;
m_elementPositionInPartitionedNodes [ i ] = i ;
m_elementToSetMap [ i ] = 0 ;
}
2016-12-06 17:42:37 +00:00
m_sets . uncheckedAppend ( SetDescriptor { 0 , size , 0 } ) ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
}
2015-07-06 21:06:30 +00:00
void reserveUninitializedCapacity ( unsigned elementCount )
{
m_partitionedElements . resize ( elementCount ) ;
m_elementPositionInPartitionedNodes . resize ( elementCount ) ;
m_elementToSetMap . resize ( elementCount ) ;
}
void addSetUnchecked ( unsigned start , unsigned size )
{
m_sets . append ( SetDescriptor { start , size , 0 } ) ;
}
void setElementUnchecked ( unsigned elementIndex , unsigned positionInPartition , unsigned setIndex )
{
ASSERT ( setIndex < m_sets . size ( ) ) ;
m_partitionedElements [ positionInPartition ] = elementIndex ;
m_elementPositionInPartitionedNodes [ elementIndex ] = positionInPartition ;
m_elementToSetMap [ elementIndex ] = setIndex ;
}
unsigned startOffsetOfSet ( unsigned setIndex ) const
{
return m_sets [ setIndex ] . start ;
}
ALWAYS_INLINE void markElementInCurrentGeneration ( unsigned elementIndex )
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
{
// Swap the node with the first unmarked node.
unsigned setIndex = m_elementToSetMap [ elementIndex ] ;
SetDescriptor & setDescriptor = m_sets [ setIndex ] ;
unsigned elementPositionInPartition = m_elementPositionInPartitionedNodes [ elementIndex ] ;
ASSERT ( elementPositionInPartition > = setDescriptor . start ) ;
ASSERT ( elementPositionInPartition < setDescriptor . end ( ) ) ;
unsigned firstUnmarkedElementPositionInPartition = setDescriptor . indexAfterMarkedElements ( ) ;
ASSERT ( firstUnmarkedElementPositionInPartition > = setDescriptor . start & & firstUnmarkedElementPositionInPartition < setDescriptor . end ( ) ) ;
ASSERT ( firstUnmarkedElementPositionInPartition > = firstUnmarkedElementPositionInPartition ) ;
// Swap the nodes in the set.
unsigned firstUnmarkedElement = m_partitionedElements [ firstUnmarkedElementPositionInPartition ] ;
m_partitionedElements [ firstUnmarkedElementPositionInPartition ] = elementIndex ;
m_partitionedElements [ elementPositionInPartition ] = firstUnmarkedElement ;
// Update their index.
m_elementPositionInPartitionedNodes [ elementIndex ] = firstUnmarkedElementPositionInPartition ;
m_elementPositionInPartitionedNodes [ firstUnmarkedElement ] = elementPositionInPartition ;
if ( ! setDescriptor . markedCount ) {
ASSERT ( ! m_setsMarkedInCurrentGeneration . contains ( setIndex ) ) ;
m_setsMarkedInCurrentGeneration . append ( setIndex ) ;
}
+ + setDescriptor . markedCount ;
}
// The function passed as argument MUST not modify the partition.
template < typename Function >
void refineGeneration ( const Function & function )
{
for ( unsigned setIndex : m_setsMarkedInCurrentGeneration ) {
SetDescriptor & setDescriptor = m_sets [ setIndex ] ;
if ( setDescriptor . markedCount = = setDescriptor . size ) {
// Everything is marked, there is nothing to refine.
setDescriptor . markedCount = 0 ;
continue ;
}
SetDescriptor newSet ;
bool newSetIsMarkedSet = setDescriptor . markedCount * 2 < = setDescriptor . size ;
if ( newSetIsMarkedSet ) {
// Less than half of the nodes have been marked.
newSet = { setDescriptor . start , setDescriptor . markedCount , 0 } ;
setDescriptor . start = setDescriptor . start + setDescriptor . markedCount ;
} else
newSet = { setDescriptor . start + setDescriptor . markedCount , setDescriptor . size - setDescriptor . markedCount , 0 } ;
setDescriptor . size - = newSet . size ;
setDescriptor . markedCount = 0 ;
unsigned newSetIndex = m_sets . size ( ) ;
m_sets . append ( newSet ) ;
for ( unsigned i = newSet . start ; i < newSet . end ( ) ; + + i )
m_elementToSetMap [ m_partitionedElements [ i ] ] = newSetIndex ;
function ( newSetIndex ) ;
}
m_setsMarkedInCurrentGeneration . clear ( ) ;
}
// Call Function() on every node of a given subset.
template < typename Function >
void iterateSet ( unsigned setIndex , const Function & function )
{
SetDescriptor & setDescriptor = m_sets [ setIndex ] ;
for ( unsigned i = setDescriptor . start ; i < setDescriptor . end ( ) ; + + i )
function ( m_partitionedElements [ i ] ) ;
}
// Index of the set containing the Node.
unsigned setIndex ( unsigned elementIndex ) const
{
return m_elementToSetMap [ elementIndex ] ;
}
// NodeIndex of the first element in the set.
unsigned firstElementInSet ( unsigned setIndex ) const
{
return m_partitionedElements [ m_sets [ setIndex ] . start ] ;
}
unsigned size ( ) const
{
return m_sets . size ( ) ;
}
private :
struct SetDescriptor {
unsigned start ;
unsigned size ;
unsigned markedCount ;
unsigned indexAfterMarkedElements ( ) const { return start + markedCount ; }
unsigned end ( ) const { return start + size ; }
} ;
// List of sets.
2015-07-08 05:44:42 +00:00
Vector < SetDescriptor , 0 , ContentExtensionsOverflowHandler > m_sets ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
// All the element indices such that two elements of the same set never have a element of a different set between them.
2015-07-08 05:44:42 +00:00
Vector < unsigned , 0 , ContentExtensionsOverflowHandler > m_partitionedElements ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
// Map elementIndex->position in the partitionedElements.
2015-07-08 05:44:42 +00:00
Vector < unsigned , 0 , ContentExtensionsOverflowHandler > m_elementPositionInPartitionedNodes ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
// Map elementIndex->SetIndex.
2015-07-08 05:44:42 +00:00
Vector < unsigned , 0 , ContentExtensionsOverflowHandler > m_elementToSetMap ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
// List of sets with any marked node. Each set can appear at most once.
// FIXME: find a good inline size for this.
2015-07-08 05:44:42 +00:00
Vector < unsigned , 128 , ContentExtensionsOverflowHandler > m_setsMarkedInCurrentGeneration ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
} ;
class FullGraphPartition {
2015-07-06 21:06:30 +00:00
typedef MutableRangeList < char , uint32_t , 128 > SingularTransitionsMutableRangeList ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
public :
2015-04-23 19:56:57 +00:00
FullGraphPartition ( const DFA & dfa )
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
{
2015-04-23 19:56:57 +00:00
m_nodePartition . initialize ( dfa . nodes . size ( ) ) ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
2015-07-06 21:06:30 +00:00
SingularTransitionsMutableRangeList singularTransitions ;
CounterConverter counterConverter ;
for ( const DFANode & node : dfa . nodes ) {
if ( node . isKilled ( ) )
continue ;
auto transitions = node . transitions ( dfa ) ;
singularTransitions . extend ( transitions . begin ( ) , transitions . end ( ) , counterConverter ) ;
}
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
2015-07-06 21:06:30 +00:00
// Count the number of transition for each singular range. This will give us the bucket size
// for the transition partition, where transitions are partitioned by "symbol".
unsigned rangeIndexAccumulator = 0 ;
for ( const auto & transition : singularTransitions ) {
m_transitionPartition . addSetUnchecked ( rangeIndexAccumulator , transition . data ) ;
rangeIndexAccumulator + = transition . data ;
}
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
// Count the number of incoming transitions per node.
2015-07-06 21:06:30 +00:00
m_flattenedTransitionsStartOffsetPerNode . resize ( dfa . nodes . size ( ) ) ;
memset ( m_flattenedTransitionsStartOffsetPerNode . data ( ) , 0 , m_flattenedTransitionsStartOffsetPerNode . size ( ) * sizeof ( unsigned ) ) ;
2015-07-08 05:44:42 +00:00
Vector < char , 0 , ContentExtensionsOverflowHandler > singularTransitionsFirsts ;
2015-07-06 21:06:30 +00:00
singularTransitionsFirsts . reserveInitialCapacity ( singularTransitions . m_ranges . size ( ) ) ;
for ( const auto & transition : singularTransitions )
singularTransitionsFirsts . uncheckedAppend ( transition . first ) ;
for ( const DFANode & node : dfa . nodes ) {
if ( node . isKilled ( ) )
continue ;
auto transitions = node . transitions ( dfa ) ;
iterateIntersections ( singularTransitionsFirsts , transitions , [ & ] ( unsigned , const DFANode : : ConstRangeIterator & origin ) {
uint32_t targetNodeIndex = origin . target ( ) ;
+ + m_flattenedTransitionsStartOffsetPerNode [ targetNodeIndex ] ;
} ) ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
}
2015-07-06 21:06:30 +00:00
// Accumulate the offsets. This gives us the start position of each bucket.
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
unsigned transitionAccumulator = 0 ;
for ( unsigned i = 0 ; i < m_flattenedTransitionsStartOffsetPerNode . size ( ) ; + + i ) {
unsigned transitionsCountForNode = m_flattenedTransitionsStartOffsetPerNode [ i ] ;
m_flattenedTransitionsStartOffsetPerNode [ i ] = transitionAccumulator ;
transitionAccumulator + = transitionsCountForNode ;
}
unsigned flattenedTransitionsSize = transitionAccumulator ;
2015-07-06 21:06:30 +00:00
ASSERT_WITH_MESSAGE ( flattenedTransitionsSize = = rangeIndexAccumulator , " The number of transitions should be the same, regardless of how they are arranged in buckets. " ) ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
2015-07-06 21:06:30 +00:00
m_transitionPartition . reserveUninitializedCapacity ( flattenedTransitionsSize ) ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
// Next, let's fill the transition table and set up the size of each group at the same time.
2015-04-23 19:56:57 +00:00
m_flattenedTransitionsSizePerNode . resize ( dfa . nodes . size ( ) ) ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
for ( unsigned & counter : m_flattenedTransitionsSizePerNode )
counter = 0 ;
m_flattenedTransitions . resize ( flattenedTransitionsSize ) ;
2015-07-06 21:06:30 +00:00
Vector < uint32_t > transitionPerRangeOffset ( m_transitionPartition . size ( ) ) ;
memset ( transitionPerRangeOffset . data ( ) , 0 , transitionPerRangeOffset . size ( ) * sizeof ( uint32_t ) ) ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
2015-04-23 19:56:57 +00:00
for ( unsigned i = 0 ; i < dfa . nodes . size ( ) ; + + i ) {
2015-07-06 21:06:30 +00:00
const DFANode & node = dfa . nodes [ i ] ;
if ( node . isKilled ( ) )
continue ;
auto transitions = node . transitions ( dfa ) ;
iterateIntersections ( singularTransitionsFirsts , transitions , [ & ] ( unsigned singularTransitonIndex , const DFANode : : ConstRangeIterator & origin ) {
uint32_t targetNodeIndex = origin . target ( ) ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
unsigned start = m_flattenedTransitionsStartOffsetPerNode [ targetNodeIndex ] ;
unsigned offset = m_flattenedTransitionsSizePerNode [ targetNodeIndex ] ;
2015-07-06 21:06:30 +00:00
unsigned positionInFlattenedTransitions = start + offset ;
m_flattenedTransitions [ positionInFlattenedTransitions ] = Transition ( { i } ) ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
2015-07-06 21:06:30 +00:00
uint32_t & inRangeOffset = transitionPerRangeOffset [ singularTransitonIndex ] ;
unsigned positionInTransitionPartition = m_transitionPartition . startOffsetOfSet ( singularTransitonIndex ) + inRangeOffset ;
+ + inRangeOffset ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
2015-07-06 21:06:30 +00:00
m_transitionPartition . setElementUnchecked ( positionInFlattenedTransitions , positionInTransitionPartition , singularTransitonIndex ) ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
2015-07-06 21:06:30 +00:00
+ + m_flattenedTransitionsSizePerNode [ targetNodeIndex ] ;
} ) ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
}
}
void markNode ( unsigned nodeIndex )
{
m_nodePartition . markElementInCurrentGeneration ( nodeIndex ) ;
}
void refinePartitions ( )
{
m_nodePartition . refineGeneration ( [ & ] ( unsigned smallestSetIndex ) {
m_nodePartition . iterateSet ( smallestSetIndex , [ & ] ( unsigned nodeIndex ) {
unsigned incomingTransitionsStartForNode = m_flattenedTransitionsStartOffsetPerNode [ nodeIndex ] ;
unsigned incomingTransitionsSizeForNode = m_flattenedTransitionsSizePerNode [ nodeIndex ] ;
for ( unsigned i = 0 ; i < incomingTransitionsSizeForNode ; + + i )
m_transitionPartition . markElementInCurrentGeneration ( incomingTransitionsStartForNode + i ) ;
} ) ;
// We only need to split the transitions, we handle the new sets through the main loop.
m_transitionPartition . refineGeneration ( [ ] ( unsigned ) { } ) ;
} ) ;
}
void splitByUniqueTransitions ( )
{
for ( ; m_nextTransitionSetToProcess < m_transitionPartition . size ( ) ; + + m_nextTransitionSetToProcess ) {
// We use the known splitters to refine the set.
m_transitionPartition . iterateSet ( m_nextTransitionSetToProcess , [ & ] ( unsigned transitionIndex ) {
unsigned sourceNodeIndex = m_flattenedTransitions [ transitionIndex ] . source ;
m_nodePartition . markElementInCurrentGeneration ( sourceNodeIndex ) ;
} ) ;
refinePartitions ( ) ;
}
}
unsigned nodeReplacement ( unsigned nodeIndex )
{
unsigned setIndex = m_nodePartition . setIndex ( nodeIndex ) ;
return m_nodePartition . firstElementInSet ( setIndex ) ;
}
private :
struct Transition {
unsigned source ;
} ;
2015-07-06 21:06:30 +00:00
struct CounterConverter {
uint32_t convert ( uint32_t )
{
return 1 ;
}
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
2015-07-06 21:06:30 +00:00
void extend ( uint32_t & destination , uint32_t )
{
+ + destination ;
}
} ;
2015-07-08 05:44:42 +00:00
Vector < unsigned , 0 , ContentExtensionsOverflowHandler > m_flattenedTransitionsStartOffsetPerNode ;
Vector < unsigned , 0 , ContentExtensionsOverflowHandler > m_flattenedTransitionsSizePerNode ;
Vector < Transition , 0 , ContentExtensionsOverflowHandler > m_flattenedTransitions ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
Partition m_nodePartition ;
Partition m_transitionPartition ;
unsigned m_nextTransitionSetToProcess { 0 } ;
} ;
struct ActionKey {
enum DeletedValueTag { DeletedValue } ;
explicit ActionKey ( DeletedValueTag ) { state = Deleted ; }
enum EmptyValueTag { EmptyValue } ;
explicit ActionKey ( EmptyValueTag ) { state = Empty ; }
2015-04-23 19:56:57 +00:00
explicit ActionKey ( const DFA * dfa , uint32_t actionsStart , uint16_t actionsLength )
: dfa ( dfa )
, actionsStart ( actionsStart )
, actionsLength ( actionsLength )
2015-05-21 01:15:43 +00:00
, state ( Valid )
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
{
StringHasher hasher ;
2015-04-23 19:56:57 +00:00
hasher . addCharactersAssumingAligned ( reinterpret_cast < const UChar * > ( & dfa - > actions [ actionsStart ] ) , actionsLength * sizeof ( uint64_t ) / sizeof ( UChar ) ) ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
hash = hasher . hash ( ) ;
}
bool isEmptyValue ( ) const { return state = = Empty ; }
bool isDeletedValue ( ) const { return state = = Deleted ; }
2019-12-21 17:18:32 +00:00
unsigned hash { 0 } ;
2015-04-23 19:56:57 +00:00
2019-12-21 17:18:32 +00:00
const DFA * dfa { nullptr } ;
uint32_t actionsStart { 0 } ;
uint16_t actionsLength { 0 } ;
2015-04-23 19:56:57 +00:00
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
enum {
Empty ,
2019-12-21 17:18:32 +00:00
Valid ,
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
Deleted
2019-12-21 17:18:32 +00:00
} state { Empty } ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
} ;
struct ActionKeyHash {
static unsigned hash ( const ActionKey & actionKey )
{
return actionKey . hash ;
}
2015-05-21 01:15:43 +00:00
static bool equal ( const ActionKey & a , const ActionKey & b )
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
{
2015-04-23 19:56:57 +00:00
if ( a . state ! = b . state
| | a . dfa ! = b . dfa
| | a . actionsLength ! = b . actionsLength )
return false ;
for ( uint16_t i = 0 ; i < a . actionsLength ; + + i ) {
if ( a . dfa - > actions [ a . actionsStart + i ] ! = a . dfa - > actions [ b . actionsStart + i ] )
return false ;
}
return true ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
}
static const bool safeToCompareToEmptyOrDeleted = false ;
} ;
struct ActionKeyHashTraits : public WTF : : CustomHashTraits < ActionKey > {
static const bool emptyValueIsZero = true ;
} ;
} // anonymous namespace.
2015-04-23 19:56:57 +00:00
void DFAMinimizer : : minimize ( DFA & dfa )
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
{
2015-04-23 19:56:57 +00:00
FullGraphPartition fullGraphPartition ( dfa ) ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
// Unlike traditional minimization final states can be differentiated by their action.
// Instead of creating a single set for the final state, we partition by actions from
// the start.
HashMap < ActionKey , Vector < unsigned > , ActionKeyHash , ActionKeyHashTraits > finalStates ;
2015-04-23 19:56:57 +00:00
for ( unsigned i = 0 ; i < dfa . nodes . size ( ) ; + + i ) {
const DFANode & node = dfa . nodes [ i ] ;
if ( node . hasActions ( ) ) {
// FIXME: Sort the actions in the dfa to make nodes that have the same actions in different order equal.
auto addResult = finalStates . add ( ActionKey ( & dfa , node . actionsStart ( ) , node . actionsLength ( ) ) , Vector < unsigned > ( ) ) ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
addResult . iterator - > value . append ( i ) ;
}
}
for ( const auto & slot : finalStates ) {
for ( unsigned finalStateIndex : slot . value )
fullGraphPartition . markNode ( finalStateIndex ) ;
fullGraphPartition . refinePartitions ( ) ;
}
// Use every splitter to refine the node partitions.
fullGraphPartition . splitByUniqueTransitions ( ) ;
Vector < unsigned > relocationVector ;
2015-04-23 19:56:57 +00:00
relocationVector . reserveInitialCapacity ( dfa . nodes . size ( ) ) ;
for ( unsigned i = 0 ; i < dfa . nodes . size ( ) ; + + i )
relocationVector . uncheckedAppend ( i ) ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
2015-07-06 21:06:30 +00:00
// Update all the transitions.
2015-04-23 19:56:57 +00:00
for ( unsigned i = 0 ; i < dfa . nodes . size ( ) ; + + i ) {
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
unsigned replacement = fullGraphPartition . nodeReplacement ( i ) ;
if ( i ! = replacement ) {
relocationVector [ i ] = replacement ;
2015-04-23 19:56:57 +00:00
dfa . nodes [ i ] . kill ( dfa ) ;
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
}
}
2015-07-06 21:06:30 +00:00
dfa . root = relocationVector [ dfa . root ] ;
2015-04-23 19:56:57 +00:00
for ( DFANode & node : dfa . nodes ) {
2015-07-06 21:06:30 +00:00
if ( node . isKilled ( ) )
continue ;
for ( auto & transition : node . transitions ( dfa ) ) {
uint32_t target = transition . target ( ) ;
uint32_t relocatedTarget = relocationVector [ target ] ;
if ( target ! = relocatedTarget )
transition . resetTarget ( relocatedTarget ) ;
}
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Reviewed by Alex Christensen.
Source/WebCore:
This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.
The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.
Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.
The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.
Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.
We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.
Let say we have the following graph.
1 --a--> 2
1 --b--> 3
2 --c--> 4 (final)
3 --c--> 4 (final)
The partition of transition would start with:
Set 0:
1 --a--> 2
Set 1:
1 --b--> 3
Set 2:
2 --c--> 4
3 --c--> 4
The state partition would have a single set with { 1, 2, 3, 4 }.
Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.
We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.
We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
to process.
In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.
In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.
There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
---
Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.
Let's take the following example:
1 --a--> 3
2 --a--> 3
1 -[f]-> 4
2 -[f]-> 5
and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.
Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.
Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.
The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.
This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.
---
Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
Canonical link: https://commits.webkit.org/161756@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@182808 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-04-14 21:08:02 +00:00
}
}
} // namespace ContentExtensions
} // namespace WebCore
# endif // ENABLE(CONTENT_EXTENSIONS)