A Html5 capable lexer.
Public Member Functions | |
HtmlLexer (string str, Node context) | |
void | CheckAfterBodyStack () |
Checks if all elements on the stack are ok to be open in the AfterBody mode. More... | |
void | Reset () |
Resets the insertion mode. http://www.w3.org/html/wg/drafts/html/master/syntax.html#the-insertion-mode More... | |
void | Parse () |
Parses the whole string. More... | |
void | Push (Element el, bool stack) |
Pushes a new open element. More... | |
void | Process (Node node, string close) |
void | Process (Node node, string close, int mode) |
void | SkipNewline () |
used by e.g. pre; skips a newline if there is one. More... | |
void | CloseParagraph () |
void | CloseParagraphThenAdd (Element el) |
Closes a paragraph in button scope then pushes the given element. More... | |
void | CloseParagraphButtonScope () |
void | InTableElse (Node node, string close) |
The all other nodes route when in the 'in table' mode. More... | |
void | CloseTableZoneInCell (string close) |
Closes the cell if the given close tag is in scope, then reprocesses it. More... | |
void | AfterHeadHeadTag (Node node) |
Handles head-favouring tags when in the 'after head' mode. Base, link, meta etc are examples of favouring tags; they prefer to be in the head. More... | |
void | CloseToTableBodyIfBody (Node node, string close) |
Closes to table body context if tbody, head or foot are in scope. More... | |
void | CloseCaption (Node node, string close) |
Closes a caption (if it's in scope) and reprocesses the node in table mode. More... | |
void | CloseIfThOrTr (Node node, string close) |
Triggers CloseCell if th or td are in scope. More... | |
void | TableBodyIfTrInScope (Node node, string close) |
Closes to a table context and switches to table body if a tr is in scope. More... | |
void | CloseSelect (bool skipScopeCheck, Node node, string close) |
void | CloseCell () |
Closes a table cell. More... | |
void | BeforeHtmlElse (Node node, string close) |
void | AdoptionAgencyAlgorithm (string tag) |
This attempts to recover mis-nested tags. For example Hi! is relatively common. This is aka the Heisenburg algorithm, but it's named 'adoption agency' in HTML5. More... | |
Element | FormattingCurrentlyOpen (string tagName) |
Checks if the named tag is currently open on the formatting stack. More... | |
void | AddFormatting (Element element) |
Adds a formatting element. More... | |
void | ClearFormatting () |
Clears formatting info to the last marker. More... | |
void | AddScopeMarker () |
Adds a formatting scope marker. More... | |
void | ReconstructFormatting () |
Reconstruct the list of active formatting elements, if any. More... | |
void | CloseMarkedFormattingElement (string close) |
Closes a marked formatting element like object or applet. More... | |
void | AddMarkedFormattingElement (Element el) |
Adds a marked formatting element like object or applet. More... | |
void | AddFormattingElement (Element el) |
bool | IsInListItemScope (string tagName) |
True if the given tag is in list item scope. More... | |
bool | IsInScope (string tagName) |
True if the given tag is in element scope. More... | |
bool | IsInButtonScope (string tagName) |
True if the given tag is in button scope. More... | |
bool | IsInTableScope (string tagName) |
True if the given tag is in table scope. More... | |
bool | IsInSelectScope (string tagName) |
True if the given tag is in select scope. More... | |
void | CloseInclusive (string tag) |
void | CloseNodesFrom (int index) |
Closes all nodes from the given open element stack index. Inclusive. More... | |
void | CloseToTableRowContext () |
Close to a table body context. thead, tfoot, tbody, html and template. More... | |
void | CloseToTableBodyContext () |
Close to a table body context. thead, tfoot, tbody, html and template. More... | |
void | CloseToTableContext () |
Close to a table context. More... | |
void | InputOrTextareaInSelect (Element el) |
Input or textarea in select mode. More... | |
void | RawTextOrRcDataAlgorithm (Element el, HtmlParseMode stateAfter) |
'Generic raw text element parsing algorithm'. Adds the current node then switches to the given state, whilst also changing the mode to Text. More... | |
void | AfterHeadElse (Node node, string close) |
Anything else in the 'after head' mode. More... | |
void | InHeadElse (Node node, string close) |
Anything else in the 'in head' mode. More... | |
void | CombineInto (Element el, Element target) |
Combines the attribs of the given element into target. Adds the attributes to target if they don't exist (doesn't overwrite). More... | |
void | BlockClose (string close) |
Attempts to close a block element. More... | |
bool | TagCurrentlyOpen (string tagName) |
Checks if the named tag is currently open. More... | |
void | TemplateStep (Node node, string close, int mode) |
Inserting something in the template. More... | |
void | CloseTemplate () |
Closes the template element. More... | |
void | Finish () |
Generate implicit end tags. More... | |
void | GenerateImpliedEndTags () |
Generate implicit end tags. More... | |
void | GenerateImpliedEndTagsThorough () |
Generate implicit end tags. More... | |
void | GenerateImpliedEndTagsExceptFor (string tagName) |
Generate implicit end tags. More... | |
void | CloseNode (Element el) |
void | CloseCurrentNode () |
Pops the last node from the stack of open nodes. More... | |
bool | CallCloseMethod (string tag, int mode) |
Calls Element.OnLexerCloseNode. Note that it's an instance method but it can be called without an instance when the DOM isn't balanced. For example, a balanced DOM will have a 'div' on the open element stack, and we want to handle its /div tag when it shows up. This would directly invoke close on that open element. If we're not balanced, it obtains SupportedTagMeta.CloseMethod and invokes it with a null instance. See SupportedTagMeta.CloseMethod for more. More... | |
Element | CreateTag (string tag, bool callLoad) |
Creates an element from the given namespace/ tag name. More... | |
Public Member Functions inherited from Dom.StringReader | |
StringReader (byte[] str) | |
Creates a new reader for the raw single-byte encoded string. Useful if you're talking to e.g. a webserver with a binary protocol. More... | |
StringReader (string str) | |
Creates a new reader for the given string. More... | |
bool | More () |
Checks if there is anything left to read. More... | |
bool | Peek (string str) |
Checks if the given string is next. More... | |
bool | PeekLower (string str) |
Checks if the given string is next; it checks by lowercasing the target character. More... | |
char | Peek () |
Takes a peek at the next character in the stream without reading it. More... | |
char | Peek (int delta) |
Takes a peek at the character that is a number of characters away from the next one without actually reading it. Peek(0) is the next character, Peek(1) is the one after that etc. More... | |
void | StepBack () |
Steps back one place in the stream. More... | |
void | Advance () |
Steps forward one place in the stream. More... | |
void | Advance (int places) |
Steps forward the given number of places in the stream. More... | |
int | Length () |
The length of the string. More... | |
string | ReadString (int length) |
Reads a substring of the given length. Note that this does not do bounds checking. More... | |
virtual char | Read () |
Reads a character from the stream and advances the stream one place. More... | |
void | ReadUntil (char character) |
Keeps reading the given character from the stream until it's no longer next. Used for e.g. stripping an unknown length block of whitespaces in the stream. More... | |
void | ReadOff (char[] chars) |
Keeps reading from the stream until no characters in the given set are next. Used for e.g. stripping an unknown number of newlines ( or ) from this stream. More... | |
void | ReadOff (char[] chars, out int count) |
Keeps reading from the stream until no characters in the given set are next. Used for e.g. stripping an unknown number of newlines ( or ) from this stream. More... | |
int | NextIndexOf (char character) |
Gets the next index of the given character. The length is returned if it wasn't found at all. More... | |
int | NextIndexOf (char character, int limit) |
Gets the next index of the given character, up to limit. Limit is returned if it wasn't found at all. More... | |
virtual int | GetLineNumber () |
Gets the line number that the pointer is currently at. More... | |
int | GetLineNumber (out int charOnLine) |
Gets the line number and character number that the pointer is currently at. More... | |
string | ReadLine (int lineNumber) |
Reads the numbered line from this stream. More... | |
Static Public Member Functions | |
static bool | IsAsciiLetter (char c) |
Determines if the given character is an upper/lowercase character. More... | |
static bool | IsSpaceCharacter (char c) |
True if the given char is any of the HTML5 space characters (includes newlines etc). More... | |
Static Public Member Functions inherited from Dom.StringReader | |
static int | NextIndexOf (int position, string input, char character, int limit) |
Gets the next index of the given character, up to limit. Limit is returned if it wasn't found at all. More... | |
static int | NextIndexOf (int position, string input, char character) |
Gets the next index of the given character. The length is returned if it wasn't found at all. More... | |
Public Attributes | |
HtmlParseMode | State |
Gets or sets the current parse mode. More... | |
MLNamespace | Namespace |
Current namespace. Defaults to XHTML (for all our HTML tags). More... | |
Document | Document |
Document we're adding to. More... | |
readonly List< Element > | OpenElements |
readonly Stack< int > | TemplateModes |
readonly List< Element > | FormattingElements |
int | PreviousMode = HtmlTreeMode.Initial |
The current tree mode. More... | |
int | CurrentMode = HtmlTreeMode.Initial |
The current tree mode. More... | |
int | TextBlockLength |
The length of the current text buffer. More... | |
System.Text.StringBuilder | Builder =new System.Text.StringBuilder() |
A string builder used for constructing tokens. More... | |
Element | head |
The head pointer. More... | |
Element | form |
The form pointer. More... | |
TextNode | PendingTableCharacters |
The pending table chars 'list' (we only ever add one to it). More... | |
string | LastStartTag |
The last created start tag name (lowercase). More... | |
bool | FramesetOk =true |
Frameset-ok flag More... | |
bool | _foster =false |
Table foster parenting. Occurs when tables are mis-nested and affects how elements are added. More... | |
Public Attributes inherited from Dom.StringReader | |
string | Input |
The original string. More... | |
int | Position |
The current position this reader is at in the string. More... | |
int | InputLength |
The length of the input string. More... | |
Properties | |
static MLNamespace | XHTMLNamespace [get] |
The XML namespace for XHTML. More... | |
string | CurrentTag [get] |
The current tag on the top of the stack. More... | |
Node | CurrentNode [get] |
The current open node. More... | |
Element | CurrentElement [get] |
The current open element. More... | |
Private Member Functions | |
int | GetAppropriateEnd (out bool closing) |
Keeps reading until </lastStartTag> is seen. More... | |
string | ReadRawTag (bool open, bool withName) |
Reads the contents of an open/close tag. More... | |
void | EndTag () |
void | OpenPCTag () |
Comment | FlushCommentNode (int positionDelta) |
void | LoadComment () |
Reads a comments body. More... | |
bool | CommentDashEnd () |
See 8.2.4.49 Comment end dash state More... | |
bool | CommentEnd () |
Checks if the comment has ended. More... | |
void | OpenRCTag () |
bool | CreateIfAppropriate (char c) |
Creates a close tag if one is appropriate. More... | |
void | BogusComment () |
See 8.2.4.44 Bogus comment state More... | |
void | HandleText (bool stopAtTag, bool allowVars) |
Creates a text content block. More... | |
void | AddElementWithFoster (Element element) |
void | InBodyEndTagElse (string close) |
Any other end tag has been found in the InBody state. More... | |
void | FlushComment () |
Writes out any pending text as a comment node. More... | |
TextNode | FlushTextNode () |
Writes out any pending text to a text element. More... | |
TextNode | AppendText (TextNode node, string text) |
Appends text to the given node or creates a new node if it's null. More... | |
void | AddVariable () |
Reads out a (as used by PowerUI for localization purposes). More... | |
Private Attributes | |
TextNode | text_ |
The latest added text node. Gets cleared whenever Process is called. More... | |
Static Private Attributes | |
static MLNamespace | _XHTMLNamespace |
Cached reference for the XHTML namespace. More... | |
Additional Inherited Members | |
Static Public Attributes inherited from Dom.StringReader | |
static char | NULL ='\0' |
The null character. This is returned when operations are working beyond the end of the stream. More... | |
|
inline |
|
inlineprivate |
|
inline |
Adds a formatting element.
|
inline |
|
inline |
Adds a marked formatting element like object or applet.
|
inline |
Adds a formatting scope marker.
|
inlineprivate |
Reads out a (as used by PowerUI for localization purposes).
|
inline |
This attempts to recover mis-nested tags. For example Hi! is relatively common. This is aka the Heisenburg algorithm, but it's named 'adoption agency' in HTML5.
tag | The actual tag given. |
|
inline |
Anything else in the 'after head' mode.
|
inline |
Handles head-favouring tags when in the 'after head' mode. Base, link, meta etc are examples of favouring tags; they prefer to be in the head.
Appends text to the given node or creates a new node if it's null.
|
inline |
|
inline |
Attempts to close a block element.
|
inlineprivate |
See 8.2.4.44 Bogus comment state
c | The current character. |
|
inline |
Calls Element.OnLexerCloseNode. Note that it's an instance method but it can be called without an instance when the DOM isn't balanced. For example, a balanced DOM will have a 'div' on the open element stack, and we want to handle its /div tag when it shows up. This would directly invoke close on that open element. If we're not balanced, it obtains SupportedTagMeta.CloseMethod and invokes it with a null instance. See SupportedTagMeta.CloseMethod for more.
|
inline |
Checks if all elements on the stack are ok to be open in the AfterBody mode.
|
inline |
Clears formatting info to the last marker.
|
inline |
Closes a caption (if it's in scope) and reprocesses the node in table mode.
|
inline |
Closes a table cell.
|
inline |
Pops the last node from the stack of open nodes.
|
inline |
Triggers CloseCell if th or td are in scope.
|
inline |
|
inline |
Closes a marked formatting element like object or applet.
|
inline |
|
inline |
Closes all nodes from the given open element stack index. Inclusive.
|
inline |
|
inline |
|
inline |
Closes a paragraph in button scope then pushes the given element.
|
inline |
|
inline |
Closes the cell if the given close tag is in scope, then reprocesses it.
|
inline |
Closes the template element.
|
inline |
Close to a table body context. thead, tfoot, tbody, html and template.
|
inline |
Closes to table body context if tbody, head or foot are in scope.
|
inline |
Close to a table context.
|
inline |
Close to a table body context. thead, tfoot, tbody, html and template.
Combines the attribs of the given element into target. Adds the attributes to target if they don't exist (doesn't overwrite).
|
inlineprivate |
See 8.2.4.49 Comment end dash state
|
inlineprivate |
Checks if the comment has ended.
|
inlineprivate |
Creates a close tag if one is appropriate.
|
inline |
Creates an element from the given namespace/ tag name.
|
inlineprivate |
|
inline |
Generate implicit end tags.
|
inlineprivate |
Writes out any pending text as a comment node.
|
inlineprivate |
|
inlineprivate |
Writes out any pending text to a text element.
|
inline |
Checks if the named tag is currently open on the formatting stack.
|
inline |
Generate implicit end tags.
|
inline |
Generate implicit end tags.
|
inline |
Generate implicit end tags.
|
inlineprivate |
Keeps reading until </lastStartTag> is seen.
|
inlineprivate |
Creates a text content block.
|
inlineprivate |
Any other end tag has been found in the InBody state.
tag | The actual tag found. |
|
inline |
Anything else in the 'in head' mode.
|
inline |
Input or textarea in select mode.
|
inline |
The all other nodes route when in the 'in table' mode.
|
inlinestatic |
Determines if the given character is an upper/lowercase character.
c | The character to examine. |
|
inline |
True if the given tag is in button scope.
|
inline |
True if the given tag is in list item scope.
|
inline |
True if the given tag is in element scope.
|
inline |
True if the given tag is in select scope.
|
inline |
True if the given tag is in table scope.
|
inlinestatic |
True if the given char is any of the HTML5 space characters (includes newlines etc).
|
inlineprivate |
Reads a comments body.
|
inlineprivate |
|
inlineprivate |
|
inline |
Parses the whole string.
|
inline |
|
inline |
|
inline |
Pushes a new open element.
|
inline |
'Generic raw text element parsing algorithm'. Adds the current node then switches to the given state, whilst also changing the mode to Text.
|
inlineprivate |
Reads the contents of an open/close tag.
|
inline |
Reconstruct the list of active formatting elements, if any.
|
inline |
Resets the insertion mode. http://www.w3.org/html/wg/drafts/html/master/syntax.html#the-insertion-mode
|
inline |
used by e.g. pre; skips a newline if there is one.
|
inline |
Closes to a table context and switches to table body if a tr is in scope.
|
inline |
Checks if the named tag is currently open.
|
inline |
Inserting something in the template.
token | The token to insert. |
mode | The mode to push. |
bool Dom.HtmlLexer._foster =false |
Table foster parenting. Occurs when tables are mis-nested and affects how elements are added.
|
staticprivate |
Cached reference for the XHTML namespace.
System.Text.StringBuilder Dom.HtmlLexer.Builder =new System.Text.StringBuilder() |
A string builder used for constructing tokens.
int Dom.HtmlLexer.CurrentMode = HtmlTreeMode.Initial |
The current tree mode.
Element Dom.HtmlLexer.form |
The form pointer.
readonly List<Element> Dom.HtmlLexer.FormattingElements |
bool Dom.HtmlLexer.FramesetOk =true |
Frameset-ok flag
Element Dom.HtmlLexer.head |
The head pointer.
string Dom.HtmlLexer.LastStartTag |
The last created start tag name (lowercase).
MLNamespace Dom.HtmlLexer.Namespace |
Current namespace. Defaults to XHTML (for all our HTML tags).
readonly List<Element> Dom.HtmlLexer.OpenElements |
TextNode Dom.HtmlLexer.PendingTableCharacters |
The pending table chars 'list' (we only ever add one to it).
int Dom.HtmlLexer.PreviousMode = HtmlTreeMode.Initial |
The current tree mode.
HtmlParseMode Dom.HtmlLexer.State |
Gets or sets the current parse mode.
readonly Stack<int> Dom.HtmlLexer.TemplateModes |
|
private |
The latest added text node. Gets cleared whenever Process is called.
int Dom.HtmlLexer.TextBlockLength |
The length of the current text buffer.
|
get |
The current open element.
|
get |
The current open node.
|
get |
The current tag on the top of the stack.
|
staticget |
The XML namespace for XHTML.