A Html5 capable lexer.
Public Member Functions | |
| HtmlLexer (string str, Node context) | |
| void | CheckAfterBodyStack () |
| Checks if all elements on the stack are ok to be open in the AfterBody mode. More... | |
| void | Reset () |
| Resets the insertion mode. http://www.w3.org/html/wg/drafts/html/master/syntax.html#the-insertion-mode More... | |
| void | Parse () |
| Parses the whole string. More... | |
| void | Push (Element el, bool stack) |
| Pushes a new open element. More... | |
| void | Process (Node node, string close) |
| void | Process (Node node, string close, int mode) |
| void | SkipNewline () |
| used by e.g. pre; skips a newline if there is one. More... | |
| void | CloseParagraph () |
| void | CloseParagraphThenAdd (Element el) |
| Closes a paragraph in button scope then pushes the given element. More... | |
| void | CloseParagraphButtonScope () |
| void | InTableElse (Node node, string close) |
| The all other nodes route when in the 'in table' mode. More... | |
| void | CloseTableZoneInCell (string close) |
| Closes the cell if the given close tag is in scope, then reprocesses it. More... | |
| void | AfterHeadHeadTag (Node node) |
| Handles head-favouring tags when in the 'after head' mode. Base, link, meta etc are examples of favouring tags; they prefer to be in the head. More... | |
| void | CloseToTableBodyIfBody (Node node, string close) |
| Closes to table body context if tbody, head or foot are in scope. More... | |
| void | CloseCaption (Node node, string close) |
| Closes a caption (if it's in scope) and reprocesses the node in table mode. More... | |
| void | CloseIfThOrTr (Node node, string close) |
| Triggers CloseCell if th or td are in scope. More... | |
| void | TableBodyIfTrInScope (Node node, string close) |
| Closes to a table context and switches to table body if a tr is in scope. More... | |
| void | CloseSelect (bool skipScopeCheck, Node node, string close) |
| void | CloseCell () |
| Closes a table cell. More... | |
| void | BeforeHtmlElse (Node node, string close) |
| void | AdoptionAgencyAlgorithm (string tag) |
| This attempts to recover mis-nested tags. For example Hi! is relatively common. This is aka the Heisenburg algorithm, but it's named 'adoption agency' in HTML5. More... | |
| Element | FormattingCurrentlyOpen (string tagName) |
| Checks if the named tag is currently open on the formatting stack. More... | |
| void | AddFormatting (Element element) |
| Adds a formatting element. More... | |
| void | ClearFormatting () |
| Clears formatting info to the last marker. More... | |
| void | AddScopeMarker () |
| Adds a formatting scope marker. More... | |
| void | ReconstructFormatting () |
| Reconstruct the list of active formatting elements, if any. More... | |
| void | CloseMarkedFormattingElement (string close) |
| Closes a marked formatting element like object or applet. More... | |
| void | AddMarkedFormattingElement (Element el) |
| Adds a marked formatting element like object or applet. More... | |
| void | AddFormattingElement (Element el) |
| bool | IsInListItemScope (string tagName) |
| True if the given tag is in list item scope. More... | |
| bool | IsInScope (string tagName) |
| True if the given tag is in element scope. More... | |
| bool | IsInButtonScope (string tagName) |
| True if the given tag is in button scope. More... | |
| bool | IsInTableScope (string tagName) |
| True if the given tag is in table scope. More... | |
| bool | IsInSelectScope (string tagName) |
| True if the given tag is in select scope. More... | |
| void | CloseInclusive (string tag) |
| void | CloseNodesFrom (int index) |
| Closes all nodes from the given open element stack index. Inclusive. More... | |
| void | CloseToTableRowContext () |
| Close to a table body context. thead, tfoot, tbody, html and template. More... | |
| void | CloseToTableBodyContext () |
| Close to a table body context. thead, tfoot, tbody, html and template. More... | |
| void | CloseToTableContext () |
| Close to a table context. More... | |
| void | InputOrTextareaInSelect (Element el) |
| Input or textarea in select mode. More... | |
| void | RawTextOrRcDataAlgorithm (Element el, HtmlParseMode stateAfter) |
| 'Generic raw text element parsing algorithm'. Adds the current node then switches to the given state, whilst also changing the mode to Text. More... | |
| void | AfterHeadElse (Node node, string close) |
| Anything else in the 'after head' mode. More... | |
| void | InHeadElse (Node node, string close) |
| Anything else in the 'in head' mode. More... | |
| void | CombineInto (Element el, Element target) |
| Combines the attribs of the given element into target. Adds the attributes to target if they don't exist (doesn't overwrite). More... | |
| void | BlockClose (string close) |
| Attempts to close a block element. More... | |
| bool | TagCurrentlyOpen (string tagName) |
| Checks if the named tag is currently open. More... | |
| void | TemplateStep (Node node, string close, int mode) |
| Inserting something in the template. More... | |
| void | CloseTemplate () |
| Closes the template element. More... | |
| void | Finish () |
| Generate implicit end tags. More... | |
| void | GenerateImpliedEndTags () |
| Generate implicit end tags. More... | |
| void | GenerateImpliedEndTagsThorough () |
| Generate implicit end tags. More... | |
| void | GenerateImpliedEndTagsExceptFor (string tagName) |
| Generate implicit end tags. More... | |
| void | CloseNode (Element el) |
| void | CloseCurrentNode () |
| Pops the last node from the stack of open nodes. More... | |
| bool | CallCloseMethod (string tag, int mode) |
| Calls Element.OnLexerCloseNode. Note that it's an instance method but it can be called without an instance when the DOM isn't balanced. For example, a balanced DOM will have a 'div' on the open element stack, and we want to handle its /div tag when it shows up. This would directly invoke close on that open element. If we're not balanced, it obtains SupportedTagMeta.CloseMethod and invokes it with a null instance. See SupportedTagMeta.CloseMethod for more. More... | |
| Element | CreateTag (string tag, bool callLoad) |
| Creates an element from the given namespace/ tag name. More... | |
Public Member Functions inherited from Dom.StringReader | |
| StringReader (byte[] str) | |
| Creates a new reader for the raw single-byte encoded string. Useful if you're talking to e.g. a webserver with a binary protocol. More... | |
| StringReader (string str) | |
| Creates a new reader for the given string. More... | |
| bool | More () |
| Checks if there is anything left to read. More... | |
| bool | Peek (string str) |
| Checks if the given string is next. More... | |
| bool | PeekLower (string str) |
| Checks if the given string is next; it checks by lowercasing the target character. More... | |
| char | Peek () |
| Takes a peek at the next character in the stream without reading it. More... | |
| char | Peek (int delta) |
| Takes a peek at the character that is a number of characters away from the next one without actually reading it. Peek(0) is the next character, Peek(1) is the one after that etc. More... | |
| void | StepBack () |
| Steps back one place in the stream. More... | |
| void | Advance () |
| Steps forward one place in the stream. More... | |
| void | Advance (int places) |
| Steps forward the given number of places in the stream. More... | |
| int | Length () |
| The length of the string. More... | |
| string | ReadString (int length) |
| Reads a substring of the given length. Note that this does not do bounds checking. More... | |
| virtual char | Read () |
| Reads a character from the stream and advances the stream one place. More... | |
| void | ReadUntil (char character) |
| Keeps reading the given character from the stream until it's no longer next. Used for e.g. stripping an unknown length block of whitespaces in the stream. More... | |
| void | ReadOff (char[] chars) |
| Keeps reading from the stream until no characters in the given set are next. Used for e.g. stripping an unknown number of newlines ( or ) from this stream. More... | |
| void | ReadOff (char[] chars, out int count) |
| Keeps reading from the stream until no characters in the given set are next. Used for e.g. stripping an unknown number of newlines ( or ) from this stream. More... | |
| int | NextIndexOf (char character) |
| Gets the next index of the given character. The length is returned if it wasn't found at all. More... | |
| int | NextIndexOf (char character, int limit) |
| Gets the next index of the given character, up to limit. Limit is returned if it wasn't found at all. More... | |
| virtual int | GetLineNumber () |
| Gets the line number that the pointer is currently at. More... | |
| int | GetLineNumber (out int charOnLine) |
| Gets the line number and character number that the pointer is currently at. More... | |
| string | ReadLine (int lineNumber) |
| Reads the numbered line from this stream. More... | |
Static Public Member Functions | |
| static bool | IsAsciiLetter (char c) |
| Determines if the given character is an upper/lowercase character. More... | |
| static bool | IsSpaceCharacter (char c) |
| True if the given char is any of the HTML5 space characters (includes newlines etc). More... | |
Static Public Member Functions inherited from Dom.StringReader | |
| static int | NextIndexOf (int position, string input, char character, int limit) |
| Gets the next index of the given character, up to limit. Limit is returned if it wasn't found at all. More... | |
| static int | NextIndexOf (int position, string input, char character) |
| Gets the next index of the given character. The length is returned if it wasn't found at all. More... | |
Public Attributes | |
| HtmlParseMode | State |
| Gets or sets the current parse mode. More... | |
| MLNamespace | Namespace |
| Current namespace. Defaults to XHTML (for all our HTML tags). More... | |
| Document | Document |
| Document we're adding to. More... | |
| readonly List< Element > | OpenElements |
| readonly Stack< int > | TemplateModes |
| readonly List< Element > | FormattingElements |
| int | PreviousMode = HtmlTreeMode.Initial |
| The current tree mode. More... | |
| int | CurrentMode = HtmlTreeMode.Initial |
| The current tree mode. More... | |
| int | TextBlockLength |
| The length of the current text buffer. More... | |
| System.Text.StringBuilder | Builder =new System.Text.StringBuilder() |
| A string builder used for constructing tokens. More... | |
| Element | head |
| The head pointer. More... | |
| Element | form |
| The form pointer. More... | |
| TextNode | PendingTableCharacters |
| The pending table chars 'list' (we only ever add one to it). More... | |
| string | LastStartTag |
| The last created start tag name (lowercase). More... | |
| bool | FramesetOk =true |
| Frameset-ok flag More... | |
| bool | _foster =false |
| Table foster parenting. Occurs when tables are mis-nested and affects how elements are added. More... | |
Public Attributes inherited from Dom.StringReader | |
| string | Input |
| The original string. More... | |
| int | Position |
| The current position this reader is at in the string. More... | |
| int | InputLength |
| The length of the input string. More... | |
Properties | |
| static MLNamespace | XHTMLNamespace [get] |
| The XML namespace for XHTML. More... | |
| string | CurrentTag [get] |
| The current tag on the top of the stack. More... | |
| Node | CurrentNode [get] |
| The current open node. More... | |
| Element | CurrentElement [get] |
| The current open element. More... | |
Private Member Functions | |
| int | GetAppropriateEnd (out bool closing) |
| Keeps reading until </lastStartTag> is seen. More... | |
| string | ReadRawTag (bool open, bool withName) |
| Reads the contents of an open/close tag. More... | |
| void | EndTag () |
| void | OpenPCTag () |
| Comment | FlushCommentNode (int positionDelta) |
| void | LoadComment () |
| Reads a comments body. More... | |
| bool | CommentDashEnd () |
| See 8.2.4.49 Comment end dash state More... | |
| bool | CommentEnd () |
| Checks if the comment has ended. More... | |
| void | OpenRCTag () |
| bool | CreateIfAppropriate (char c) |
| Creates a close tag if one is appropriate. More... | |
| void | BogusComment () |
| See 8.2.4.44 Bogus comment state More... | |
| void | HandleText (bool stopAtTag, bool allowVars) |
| Creates a text content block. More... | |
| void | AddElementWithFoster (Element element) |
| void | InBodyEndTagElse (string close) |
| Any other end tag has been found in the InBody state. More... | |
| void | FlushComment () |
| Writes out any pending text as a comment node. More... | |
| TextNode | FlushTextNode () |
| Writes out any pending text to a text element. More... | |
| TextNode | AppendText (TextNode node, string text) |
| Appends text to the given node or creates a new node if it's null. More... | |
| void | AddVariable () |
| Reads out a (as used by PowerUI for localization purposes). More... | |
Private Attributes | |
| TextNode | text_ |
| The latest added text node. Gets cleared whenever Process is called. More... | |
Static Private Attributes | |
| static MLNamespace | _XHTMLNamespace |
| Cached reference for the XHTML namespace. More... | |
Additional Inherited Members | |
Static Public Attributes inherited from Dom.StringReader | |
| static char | NULL ='\0' |
| The null character. This is returned when operations are working beyond the end of the stream. More... | |
|
inline |
|
inlineprivate |
|
inline |
Adds a formatting element.
|
inline |
|
inline |
Adds a marked formatting element like object or applet.
|
inline |
Adds a formatting scope marker.
|
inlineprivate |
Reads out a (as used by PowerUI for localization purposes).
|
inline |
This attempts to recover mis-nested tags. For example Hi! is relatively common. This is aka the Heisenburg algorithm, but it's named 'adoption agency' in HTML5.
| tag | The actual tag given. |
|
inline |
Anything else in the 'after head' mode.
|
inline |
Handles head-favouring tags when in the 'after head' mode. Base, link, meta etc are examples of favouring tags; they prefer to be in the head.
Appends text to the given node or creates a new node if it's null.
|
inline |
|
inline |
Attempts to close a block element.
|
inlineprivate |
See 8.2.4.44 Bogus comment state
| c | The current character. |
|
inline |
Calls Element.OnLexerCloseNode. Note that it's an instance method but it can be called without an instance when the DOM isn't balanced. For example, a balanced DOM will have a 'div' on the open element stack, and we want to handle its /div tag when it shows up. This would directly invoke close on that open element. If we're not balanced, it obtains SupportedTagMeta.CloseMethod and invokes it with a null instance. See SupportedTagMeta.CloseMethod for more.
|
inline |
Checks if all elements on the stack are ok to be open in the AfterBody mode.
|
inline |
Clears formatting info to the last marker.
|
inline |
Closes a caption (if it's in scope) and reprocesses the node in table mode.
|
inline |
Closes a table cell.
|
inline |
Pops the last node from the stack of open nodes.
|
inline |
Triggers CloseCell if th or td are in scope.
|
inline |
|
inline |
Closes a marked formatting element like object or applet.
|
inline |
|
inline |
Closes all nodes from the given open element stack index. Inclusive.
|
inline |
|
inline |
|
inline |
Closes a paragraph in button scope then pushes the given element.
|
inline |
|
inline |
Closes the cell if the given close tag is in scope, then reprocesses it.
|
inline |
Closes the template element.
|
inline |
Close to a table body context. thead, tfoot, tbody, html and template.
|
inline |
Closes to table body context if tbody, head or foot are in scope.
|
inline |
Close to a table context.
|
inline |
Close to a table body context. thead, tfoot, tbody, html and template.
Combines the attribs of the given element into target. Adds the attributes to target if they don't exist (doesn't overwrite).
|
inlineprivate |
See 8.2.4.49 Comment end dash state
|
inlineprivate |
Checks if the comment has ended.
|
inlineprivate |
Creates a close tag if one is appropriate.
|
inline |
Creates an element from the given namespace/ tag name.
|
inlineprivate |
|
inline |
Generate implicit end tags.
|
inlineprivate |
Writes out any pending text as a comment node.
|
inlineprivate |
|
inlineprivate |
Writes out any pending text to a text element.
|
inline |
Checks if the named tag is currently open on the formatting stack.
|
inline |
Generate implicit end tags.
|
inline |
Generate implicit end tags.
|
inline |
Generate implicit end tags.
|
inlineprivate |
Keeps reading until </lastStartTag> is seen.
|
inlineprivate |
Creates a text content block.
|
inlineprivate |
Any other end tag has been found in the InBody state.
| tag | The actual tag found. |
|
inline |
Anything else in the 'in head' mode.
|
inline |
Input or textarea in select mode.
|
inline |
The all other nodes route when in the 'in table' mode.
|
inlinestatic |
Determines if the given character is an upper/lowercase character.
| c | The character to examine. |
|
inline |
True if the given tag is in button scope.
|
inline |
True if the given tag is in list item scope.
|
inline |
True if the given tag is in element scope.
|
inline |
True if the given tag is in select scope.
|
inline |
True if the given tag is in table scope.
|
inlinestatic |
True if the given char is any of the HTML5 space characters (includes newlines etc).
|
inlineprivate |
Reads a comments body.
|
inlineprivate |
|
inlineprivate |
|
inline |
Parses the whole string.
|
inline |
|
inline |
|
inline |
Pushes a new open element.
|
inline |
'Generic raw text element parsing algorithm'. Adds the current node then switches to the given state, whilst also changing the mode to Text.
|
inlineprivate |
Reads the contents of an open/close tag.
|
inline |
Reconstruct the list of active formatting elements, if any.
|
inline |
Resets the insertion mode. http://www.w3.org/html/wg/drafts/html/master/syntax.html#the-insertion-mode
|
inline |
used by e.g. pre; skips a newline if there is one.
|
inline |
Closes to a table context and switches to table body if a tr is in scope.
|
inline |
Checks if the named tag is currently open.
|
inline |
Inserting something in the template.
| token | The token to insert. |
| mode | The mode to push. |
| bool Dom.HtmlLexer._foster =false |
Table foster parenting. Occurs when tables are mis-nested and affects how elements are added.
|
staticprivate |
Cached reference for the XHTML namespace.
| System.Text.StringBuilder Dom.HtmlLexer.Builder =new System.Text.StringBuilder() |
A string builder used for constructing tokens.
| int Dom.HtmlLexer.CurrentMode = HtmlTreeMode.Initial |
The current tree mode.
| Element Dom.HtmlLexer.form |
The form pointer.
| readonly List<Element> Dom.HtmlLexer.FormattingElements |
| bool Dom.HtmlLexer.FramesetOk =true |
Frameset-ok flag
| Element Dom.HtmlLexer.head |
The head pointer.
| string Dom.HtmlLexer.LastStartTag |
The last created start tag name (lowercase).
| MLNamespace Dom.HtmlLexer.Namespace |
Current namespace. Defaults to XHTML (for all our HTML tags).
| readonly List<Element> Dom.HtmlLexer.OpenElements |
| TextNode Dom.HtmlLexer.PendingTableCharacters |
The pending table chars 'list' (we only ever add one to it).
| int Dom.HtmlLexer.PreviousMode = HtmlTreeMode.Initial |
The current tree mode.
| HtmlParseMode Dom.HtmlLexer.State |
Gets or sets the current parse mode.
| readonly Stack<int> Dom.HtmlLexer.TemplateModes |
|
private |
The latest added text node. Gets cleared whenever Process is called.
| int Dom.HtmlLexer.TextBlockLength |
The length of the current text buffer.
|
get |
The current open element.
|
get |
The current open node.
|
get |
The current tag on the top of the stack.
|
staticget |
The XML namespace for XHTML.