Google

>
 HaXmlParentContentsIndex
Text.XML.HaXml.Lex
Contents
Entry points to the lexer
Token and position types
Description

You don't normally need to use this Lex module directly - it is called automatically by the parser. (This interface is only exposed for debugging purposes.)

This is a hand-written lexer for tokenising the text of an XML document so that it is ready for parsing. It attaches position information in (line,column) format to every token. The main entry point is . A secondary entry point, , is provided for when the parser needs to stuff a string back onto the front of the text and re-tokenise it (typically when expanding macros).

As one would expect, the lexer is essentially a small finite state machine.

Synopsis
:: String -> String -> [Token]
:: Posn -> String -> [Token]
posInNewCxt :: String -> Maybe Posn -> Posn
type Token = (Posn, TokenT)
data Posn = Pn String Int Int (Maybe Posn)
data TokenT
= TokCommentOpen
| TokCommentClose
| TokPIOpen
| TokPIClose
| TokSectionOpen
| TokSectionClose
| TokSection Section
| TokSpecialOpen
| TokSpecial Special
| TokEndOpen
| TokEndClose
| TokAnyOpen
| TokAnyClose
| TokSqOpen
| TokSqClose
| TokEqual
| TokQuery
| TokStar
| TokPlus
| TokAmp
| TokSemi
| TokHash
| TokBraOpen
| TokBraClose
| TokPipe
| TokPercent
| TokComma
| TokQuote
| TokName String
| TokFreeText String
| TokNull
data Special
= DOCTYPEx
| ELEMENTx
| ATTLISTx
| ENTITYx
| NOTATIONx
data Section
= CDATAx
| INCLUDEx
| IGNOREx
Entry points to the lexer
:: String -> String -> [Token]
The first argument to is the filename (used for source positions, especially in error messages), and the second is the string content of the XML file.
:: Posn -> String -> [Token]
is used when the parser expands a macro (PE reference). The expansion of the macro must be re-lexed as if for the first time.
posInNewCxt :: String -> Maybe Posn -> Posn
posInNewCxt name pos creates a new source position from an old one. It is used when opening a new file (e.g. a DTD inclusion), to denote the start of the file name, but retain the stacked information that it was included from the old pos.
Token and position types
type Token = (Posn, TokenT)
All tokens are paired up with a source position.
data Posn
Source positions contain a filename, line, column, and an inclusion point, which is itself another source position, recursively.
Constructors
Pn String Int Int (Maybe Posn)
Instances
Show Posn
data TokenT
The basic token type.
Constructors
TokCommentOpen <!--
TokCommentClose -->
TokPIOpen <?
TokPIClose ?>
TokSectionOpen <![
TokSectionClose ]]>
TokSection Section CDATA INCLUDE IGNORE etc
TokSpecialOpen <!
TokSpecial Special DOCTYPE ELEMENT ATTLIST etc
TokEndOpen </
TokEndClose />
TokAnyOpen <
TokAnyClose >
TokSqOpen [
TokSqClose ]
TokEqual =
TokQuery ?
TokStar *
TokPlus +
TokAmp &
TokSemi ;
TokHash #
TokBraOpen (
TokBraClose )
TokPipe |
TokPercent %
TokComma ,
TokQuote '' or ""
TokName String begins with letter
TokFreeText String any character data
TokNull fake token
Instances
Show TokenT
data Special
Constructors
DOCTYPEx
ELEMENTx
ATTLISTx
ENTITYx
NOTATIONx
data Section
Constructors
CDATAx
INCLUDEx
IGNOREx
Produced by Haddock version 0.4