11. Parsers |
Module Parser.XML |
string autoconvert(string xml)
CLASS Parser.XML.Simple |
array parse(string xml, string context, function cb, mixed ... extra_args)
array parse(string xml, function cb, mixed ... extra_args)
The context argument was introduced in Pike 7.8.
mixed parse_dtd(string dtd, string context, function cb, mixed ... extras)
mixed parse_dtd(string dtd, function cb, mixed ... extras)
The context argument was introduced in Pike 7.8.
string lookup_entity(string entity)
Returns the verbatim expansion of the entity.
Added in Pike 7.7.
void define_entity_raw(string entity, string raw)
Define an entity or an SMEG.
Entity name, or SMEG name (if preceeded by a "%"
).
Verbatim expansion of the entity.
define_entity()
void define_entity(string entity, string s, function cb, mixed ... extras)
Define an entity or an SMEG.
Entity name, or SMEG name (if preceeded by a "%"
).
Expansion of the entity. Entity evaluation will be performed.
define_entity_raw()
void allow_rxml_entities(int(0..1) yes_no)
void compat_allow_errors(string version)
Set whether the parser should allow certain errors for compatibility with earlier versions. version can be:
|
version can also be zero to enable all error checks.
CLASS Parser.XML.Simple.Context |
mixed parse_xml()
mixed parse_dtd()
string parse_entity()
void push_string(string s)
void push_string(string s, string context)
Add a string to parse at the current position.
String to insert at the current parsing position.
Optional context used to refer to the inserted string.
This is typically an URL, but may also be an entity
(preceeded by an "&"
) or a SMEG reference
(preceeded by a "%"
).
Not used by the XML parser as such, but is simply
passed into the callbackinfo mapping as
the field "context"
where it can be useful
for eg resolving relative URLs when parsing DTDs,
or for determining where errors occur.
The context argument was introduced in Pike 7.8.
void Parser.XML.Simple.Context(string s, string context, int flags, function cb, mixed ... extra_args)
void Parser.XML.Simple.Context(string s, int flags, function cb, mixed ... extra_args)
These two arguments are passed along to push_string() .
Parser flags.
Callback function. This function gets called at various stages during the parsing.
The context argument was introduced in Pike 7.8.
CLASS Parser.XML.Validating |
Validating XML parser.
Validates an XML file according to a DTD.
cf http://wwww.w3.org/TR/REC-xml/
$Id: 74718ccb3f8dc66d39c8bcca45ca590427558c6a $
inherit .Simple : Simple
Extends the Simple XML parser.
int isname(string s)
Check if s is a valid Name.
int isnmtoken(string s)
Check if s is a valid Nmtoken.
int isnames(string s)
Check if s is a valid list of Names.
int isnmtokens(string s)
Check if s is a valid list of Nmtokens.
string get_external_entity(string sysid, string|void pubid, mapping|__deprecated__(int)|void info, mixed ... extra)
Get an external entity.
Called when a <!DOCTYPE> with a SYSTEM identifier is encountered, or when an entity reference needs expanding.
The SYSTEM identifier.
The PUBLIC identifier (if any).
The callbackinfo mapping containing the current parser state.
The extra arguments as passed to parse() or parse_dtd() .
Returns a string with a DTD fragment on success.
Returns 0
(zero) on failure.
Returning zero will cause the validator to report an error.
In Pike 7.7 and earlier info had the value 0
(zero).
The default implementation always returns 0
(zero).
Override this function to provide other behaviour.
parse() , parse_dtd()
mixed validate(string kind, string name, mapping attributes, array|string contents, mapping(string:mixed) info, function(string:mixed) callback, array(mixed) extra)
The validation callback function.
::parse()
array parse(string data, string|function(string:mixed) callback, mixed ... extra)
Document this function
array parse_dtd(string data, string|function(string:mixed) callback, mixed ... extra)
Document this function
CLASS Parser.XML.Validating.Element |
XML Element node.
Module Parser.XML.Tree |
XML parser that generates node-trees.
Has some support for XML namespaces http://www.w3.org/TR/REC-xml-names/ RFC 2518 23.4.
This module defines two sets of node trees; the SimpleNode -based, and the Node -based. The main difference between the two, is that the Node -based trees have parent pointers, which tend to generate circular data references and thus garbage.
There are some more subtle differences between the two. Please read the documentation carefully.
constant Parser.XML.Tree.STOP_WALK
constant Parser.XML.Tree.XML_ROOT
constant Parser.XML.Tree.XML_ELEMENT
constant Parser.XML.Tree.XML_TEXT
constant Parser.XML.Tree.XML_HEADER
constant Parser.XML.Tree.XML_PI
constant Parser.XML.Tree.XML_COMMENT
constant Parser.XML.Tree.XML_DOCTYPE
constant Parser.XML.Tree.XML_ATTR
Attribute nodes are created on demand
constant Parser.XML.Tree.DTD_ENTITY
constant Parser.XML.Tree.DTD_ELEMENT
constant Parser.XML.Tree.DTD_ATTLIST
constant Parser.XML.Tree.DTD_NOTATION
constant Parser.XML.Tree.XML_NODE
string text_quote(string data)
Quotes the string given in data by escaping &, < and >.
string roxen_text_quote(string data)
Quotes strings just like text_quote , but entities in the form &foo.bar; will not be quoted.
string attribute_quote(string data)
Quotes the string given in data by escaping &, <, >, ' and ".
string roxen_attribute_quote(string data)
Quotes strings just like attribute_quote , but entities in the form &foo.bar; will not be quoted.
SimpleRootNode simple_parse_input(string data, void|mapping predefined_entities, ParseFlags|void flags, string|void default_namespace)
Takes an XML string and produces a SimpleNode tree.
SimpleRootNode simple_parse_file(string path, void|mapping predefined_entities, ParseFlags|void flags, string|void default_namespace)
Loads the XML file path , creates a SimpleNode tree representation and returns the root node.
RootNode parse_input(string data, void|int(0..1) no_fallback, void|int(0..1) force_lowercase, void|mapping(string:string) predefined_entities, void|int(0..1) parse_namespaces, ParseFlags|void flags)
Takes an XML string and produces a node tree.
flags is not used for PARSE_WANT_ERROR_CONTEXT , PARSE_FORCE_LOWERCASE or PARSE_ENABLE_NAMESPACES since they are covered by the separate flag arguments.
Node parse_file(string path, int(0..1)|void parse_namespaces)
Loads the XML file path , creates a node tree representation and returns the root node.
ENUM Parser.XML.Tree.ParseFlags |
Flags used together with simple_parse_input() and simple_parse_file() .
CLASS Parser.XML.Tree.XMLNSParser |
Namespace aware parser.
mapping(string:string) Enter(mapping(string:string) attrs)
Check attrs for namespaces.
Returns the namespace expanded version of attrs .
CLASS Parser.XML.Tree.AbstractSimpleNode |
Base class for nodes.
array(AbstractSimpleNode) get_children()
Returns all the nodes children.
int count_children()
Returns the number of children of the node.
AbstractSimpleNode low_clone()
Returns an initialized copy of the node.
The returned node has no children.
AbstractSimpleNode clone()
Returns a clone of the sub-tree rooted in the node.
AbstractSimpleNode get_last_child()
Returns the last child node or zero.
AbstractSimpleNode `[](int pos)
The [] operator indexes among the node children, so
node[0]
returns the first node and node[-1]
the last.
The [] operator will select a node from all the nodes children, not just its element children.
AbstractSimpleNode add_child(AbstractSimpleNode c)
Adds the given node to the list of children of this node. The new node is added last in the list.
The return value differs from the one returned by Node()->add_child() .
The current node.
AbstractSimpleNode add_child_before(AbstractSimpleNode c, AbstractSimpleNode old)
Adds the node c to the list of children of this node. The node is added before the node old , which is assumed to be an existing child of this node. The node is added last if old is zero.
The current node.
AbstractSimpleNode add_child_after(AbstractSimpleNode c, AbstractSimpleNode old)
Adds the node c to the list of children of this node. The node is added after the node old , which is assumed to be an existing child of this node. The node is added first if old is zero.
The current node.
void remove_child(AbstractSimpleNode c)
Removes all occurrences of the provided node from the list of children of this node.
void replace_children(array(AbstractSimpleNode) children)
Replaces the nodes children with the provided ones.
AbstractSimpleNode replace_child(AbstractSimpleNode old, AbstractSimpleNode new)
Replaces the first occurrence of the old node child with the new node child.
The return value differs from the one returned by Node()->replace_child() .
Returns the current node on success, and 0
(zero)
if the node old wasn't found.
void zap_tree()
Destruct the tree recursively. When the inheriting AbstractNode or Node is used, which have parent pointers, this function should be called for every tree that no longer is in use to avoid frequent garbage collector runs.
int walk_preorder(function(AbstractSimpleNode:int|void) callback, mixed ... args)
Traverse the node subtree in preorder, root node first, then subtrees from left to right, calling the callback function for every node. If the callback function returns STOP_WALK the traverse is promptly aborted and STOP_WALK is returned.
int walk_preorder_2(function(AbstractSimpleNode:int|void) cb_1, function(AbstractSimpleNode:int|void) cb_2, mixed ... args)
Traverse the node subtree in preorder, root node first, then subtrees from left to right. For each node we call cb_1 before iterating through children, and then cb_2 (which always gets called even if the walk is aborted earlier). If the callback function returns STOP_WALK the traverse decend is aborted and STOP_WALK is returned once all waiting cb_2 functions have been called.
int walk_inorder(function(AbstractSimpleNode:int|void) callback, mixed ... args)
Traverse the node subtree in inorder, left subtree first, then root node, and finally the remaining subtrees, calling the function callback for every node. If the function callback returns STOP_WALK the traverse is promptly aborted and STOP_WALK is returned.
int walk_postorder(function(AbstractSimpleNode:int|void) callback, mixed ... args)
Traverse the node subtree in postorder, first subtrees from left to right, then the root node, calling the function callback for every node. If the function callback returns STOP_WALK the traverse is promptly aborted and STOP_WALK is returned.
int iterate_children(function(AbstractSimpleNode:int|void) callback, mixed ... args)
Iterates over the nodes children from left to right, calling the function callback for every node. If the callback function returns STOP_WALK the iteration is promptly aborted and STOP_WALK is returned.
array(AbstractSimpleNode) get_descendants(int(0..1) include_self)
Returns a list of all descendants in document order. Includes this node if include_self is set.
CLASS Parser.XML.Tree.AbstractNode |
Base class for nodes with parent pointers.
inherit AbstractSimpleNode : AbstractSimpleNode
void set_parent(AbstractNode parent)
Sets the parent node to parent .
AbstractNode get_parent()
Returns the parent node.
AbstractNode low_clone()
Returns an initialized copy of the node.
The returned node has no children, and no parent.
AbstractNode clone(void|int(-1..1) direction)
Clones the node, optionally connected to parts of the tree. If direction is -1 the cloned nodes parent will be set, if direction is 1 the clone nodes childen will be set.
AbstractNode get_root()
Follows all parent pointers and returns the root node.
AbstractNode add_child(AbstractNode c)
Adds the node c to the list of children of this node. The node is added before the node old , which is assumed to be an existing child of this node. The node is added first if old is zero.
Returns the new child node, NOT the current node.
The new child node is returned.
AbstractNode add_child_before(AbstractNode c, AbstractNode old)
Adds the node c to the list of children of this node. The node is added before the node old , which is assumed to be an existing child of this node. The node is added last if old is zero.
The current node.
AbstractNode add_child_after(AbstractNode c, AbstractNode old)
Adds the node c to the list of children of this node. The node is added after the node old , which is assumed to be an existing child of this node. The node is added first if old is zero.
The current node.
AbstractNode tmp_add_child(AbstractNode c)
AbstractNode tmp_add_child_before(AbstractNode c, AbstractNode old)
AbstractNode tmp_add_child_after(AbstractNode c, AbstractNode old)
Variants of add_child , add_child_before and add_child_after that doesn't set the parent pointer in the newly added children.
This is useful while building a node tree, to get efficient refcount garbage collection if the build stops abruptly. fix_tree has to be called on the root node when the building is done.
void fix_tree()
Fix all parent pointers recursively in a tree that has been built with tmp_add_child .
void remove_child(AbstractNode c)
Removes all occurrences of the provided node from the called nodes list of children. The removed nodes parent reference is set to null.
void remove_node()
Removes this node from its parent. The parent reference is set to null.
void replace_children(array(AbstractNode) children)
Replaces the nodes children with the provided ones. All parent references are updated.
AbstractNode replace_child(AbstractNode old, AbstractNode new)
Replaces the first occurrence of the old node child with the new node child. All parent references are updated.
The returned value is NOT the current node.
Returns the new child node.
AbstractNode replace_node(AbstractNode new)
Replaces this node with the provided one.
Returns the new node.
array(AbstractNode) get_preceding_siblings()
Returns all preceding siblings, i.e. all siblings present before this node in the parents children list.
array(AbstractNode) get_following_siblings()
Returns all following siblings, i.e. all siblings present after this node in the parents children list.
array(AbstractNode) get_siblings()
Returns all siblings, including this node.
array(AbstractNode) get_ancestors(int(0..1) include_self)
Returns a list of all ancestors, with the top node last. The list will start with this node if include_self is set.
array(AbstractNode) get_preceding()
Returns all preceding nodes, excluding this nodes ancestors.
array(AbstractNode) get_following()
Returns all the nodes that follows after the current one.
CLASS Parser.XML.Tree.VirtualNode |
Node in XML tree
mapping(string:string) get_attributes()
Returns this nodes attributes, which can be altered destructivly to alter the nodes attributes.
mapping get_short_attributes()
Returns this nodes name-space adjusted attributes.
set_short_namespaces() must have been called before calling this function.
int get_node_type()
Returns the node type. See defined node type constants.
string get_text()
Returns text content in node.
int get_doc_order()
void set_doc_order(int o)
string get_tag_name()
Returns the name of the element node, or the nearest element above if an attribute node.
string get_any_name()
Return name of tag or name of attribute node.
void set_tag_name(string name)
Change the tag name destructively. Can only be used on element and processing-instruction nodes.
string get_namespace()
Return the (resolved) namespace for this node.
string get_full_name()
Return fully qualified name of the element node.
void Parser.XML.Tree.VirtualNode(int type, string name, mapping attr, string text)
string value_of_node()
If the node is an attribute node or a text node, its value is returned. Otherwise the child text nodes are concatenated and returned.
AbstractNode get_first_element(string|void name, int(0..1)|void full)
Returns the first element child to this node.
If provided, the first element child with that name is returned.
If specified, name matching will be done against the full name.
Returns the first matching node, and 0 if no such node was found.
array(AbstractNode) get_elements(string|void name, int(0..1)|void full)
Returns all element children to this node.
If provided, only elements with that name is returned.
If specified, name matching will be done against the full name.
Returns an array with matching nodes.
mixed cast(string to)
It is possible to cast a node to a string, which will return render_xml() for that node.
string render_xml(void|int(0..1) preserve_roxen_entities, void|mapping(string:string) namespace_lookup)
Creates an XML representation of the node sub tree. If the flag preserve_roxen_entities is set, entities on the form &foo.bar; will not be escaped.
Mapping from namespace prefix to namespace symbol prefix.
void render_to_file(Stdio.File f, void|int(0..1) preserve_roxen_entities)
Creates an XML representation fo the node sub tree and streams the output to the file f . If the flag preserve_roxen_entities is set, entities on the form &foo.bar; will not be escaped.
CLASS Parser.XML.Tree.SimpleNode |
XML node without parent pointers and attribute nodes.
inherit AbstractSimpleNode : AbstractSimpleNode
inherit VirtualNode : VirtualNode
CLASS Parser.XML.Tree.Node |
XML node with parent pointers.
inherit AbstractNode : AbstractNode
inherit VirtualNode : VirtualNode
string get_tag_name()
Returns the name of the element node, or the nearest element above if an attribute node.
string get_attr_name()
Returns the name of the attribute node.
array(Node) get_attribute_nodes()
Creates and returns an array of new nodes; they will not be added as proper children to the parent node, but the parent link in the nodes are set so that upwards traversal is made possible.
CLASS Parser.XML.Tree.XMLParser |
Mixin for parsing XML.
this_program node_factory(int type, string name, mapping attr, string text)
Factory for creating nodes.
Type of node to create. One of:
|
Name of the tag if applicable.
Attributes for the tag if applicable.
Contained text of the tab if any.
This function is called during parsning to create the various XML nodes.
Overload this function to provide application-specific XML nodes.
Returns a node object representing the XML tag,
or 0
(zero) if the subtree rooted in the
tag should be cut.
This function is not available in Pike 7.6 and earlier.
CLASS Parser.XML.Tree.SimpleRootNode |
The root node of an XML-tree consisting of SimpleNode s.
inherit SimpleNode : SimpleNode
inherit XMLParser : XMLParser
SimpleElementNode get_element_by_id(string id, int|void force)
Find the element with the specified id.
The XML id of the node to search for.
Force a regeneration of the id lookup cache. Needed the first time after the node tree has been modified by adding or removing element nodes, or by changing the id attribute of an element node.
Returns the element node with the specified id if any. Returns UNDEFINED otherwise.
flush_node_id_cache
void flush_node_id_cache()
Clears the node id cache built and used by get_element_by_id .
CLASS Parser.XML.Tree.RootNode |
The root node of an XML-tree consisting of Node s.
inherit Node : Node
inherit XMLParser : XMLParser
ElementNode get_element_by_id(string id, int|void force)
Find the element with the specified id.
The XML id of the node to search for.
Force a regeneration of the id lookup cache. Needed the first time after the node tree has been modified by adding or removing element nodes, or by changing the id attribute of an element node.
Returns the element node with the specified id if any. Returns UNDEFINED otherwise.
flush_node_id_cache
void flush_node_id_cache()
Clears the node id cache built and used by get_element_by_id .
Module Parser.XML.NSTree |
A namespace aware version of Parser.XML.Tree. This implementation does as little validation as possible, so e.g. you can call your namespace xmlfoo without complaints.
inherit Parser.XML.Tree : Tree
NSNode parse_input(string data, void|string default_ns)
Takes a XML string data and produces a namespace node tree. If default_ns is given, it will be used as the default namespace.
Throws an error when an error is encountered during XML parsing.
string visualize(Node n, void|string indent)
Makes a visualization of a node graph suitable for printing out on a terminal.
> object x = parse_input("<a><b><c/>d</b><b><e/><f>g</f></b></a>"); > write(visualize(x)); Node(ROOT) NSNode(ELEMENT,"a") NSNode(ELEMENT,"b") NSNode(ELEMENT,"c") NSNode(TEXT) NSNode(ELEMENT,"b") NSNode(ELEMENT,"e") NSNode(ELEMENT,"f") NSNode(TEXT) Result 1: 201
CLASS Parser.XML.NSTree.NSNode |
Namespace aware node.
inherit Node : Node
string get_ns()
Returns the namespace in which the current element is defined in.
string get_default_ns()
Returns the default namespace in the current scope.
mapping(string:string) get_defined_nss()
Returns a mapping with all the namespaces defined in the current scope, except the default namespace.
The returned mapping is the same as the one in the node, so destructive changes will affect the node.
mapping(string:string) get_ns_attributes(string namespace)
Returns the attributes in this node that is declared in the provided namespace.
mapping(string:mapping(string:string)) get_ns_attributes()
Returns all the attributes in all namespaces that is associated with this node.
The returned mapping is the same as the one in the node, so destructive changes will affect the node.
void add_namespace(string ns, void|string symbol, void|int(0..1) chain)
Adds a new namespace to this node. The preferred symbol to use to identify the namespace can be provided in the symbol argument. If chain is set, no attempts to overwrite an already defined namespace with the same identifier will be made.
mapping(string:string) diff_namespaces()
Returns the difference between this nodes and its parents namespaces.
string get_xml_name()
Returns the element name as it occurs in xml files. E.g. "zonk:name" for the element "name" defined in a namespace denoted with "zonk". It will look up a symbol for the namespace in the symbol tables for the node and its parents. If none is found a new label will be generated by hashing the namespace.
void remove_child(NSNode child)
The remove_child is a not updated to take care of name space issues. To properly remove all the parents name spaces from the chid, call remove_node in the child.
Module Parser.XML.SloppyDOM |
A somewhat DOM-like library that implements lazy generation of the node tree, i.e. it's generated from the data upon lookup. There's also a little bit of XPath evaluation to do queries on the node tree.
Implementation note: This is generally more pragmatic than Parser.XML.DOM , meaning it's not so pretty and compliant, but more efficient.
Implementation status: There's only enough implemented to parse a node tree from source and access it, i.e. modification functions aren't implemented. Data hiding stuff like NodeList and NamedNodeMap is not implemented, partly since it's cumbersome to meet the "live" requirement. Also, Parser.HTML is used in XML mode to parse the input. Thus it's too error tolerant to be XML compliant, and it currently doesn't handle DTD elements, like "<!DOCTYPE", or the XML declaration (i.e. "<?xml version='1.0'?>".
Document parse(string source, void|int raw_values)
Normally entities are decoded, and Node.xml_format will encode them again. If raw_values is nonzero then all text and attribute values are instead kept in their original form.
CLASS Parser.XML.SloppyDOM.Node |
Basic node.
string get_text_content()
If the raw_values flag is set in the owning document, the text is returned with entities and CDATA blocks intact.
parse
mapping(string:string)|Node|array(mapping(string:string)|Node)|string simple_path(string path, void|int xml_format)
Access a node or a set of nodes through an expression that is a subset of an XPath RelativeLocationPath in abbreviated form.
That means one or more Steps separated by "/" or "//". A Step consists of an AxisSpecifier followed by a NodeTest and then optionally by one or more Predicate's.
"/" before a Step causes it to be matched only against the immediate children of the node(s) selected by the previous Step. "//" before a Step causes it to be matched against any children in the tree below the node(s) selected by the previous Step. The initial selection before the first Step is this element.
The currently allowed AxisSpecifier NodeTest combinations are:
name to select all elements with the given name. The name can be "*" to select all.
@name to select all attributes with the given name. The name can be "*" to select all.
comment() to select all comments.
text() to select all text and CDATA blocks. Note that all entity references are also selected, under the assumption that they would expand to text only.
processing-instruction("name") to select all processing instructions with the given name. The name can be left out to select all. Either ' or " may be used to delimit the name. For compatibility, it can also occur without surrounding quotes.
node() to select all nodes, i.e. the whole content of an element node.
. to select the currently selected element itself.
A Predicate is on the form [PredicateExpr] where PredicateExpr currently can be in any of the following forms:
An integer indexes one item in the selected set, according to the document order. A negative index counts from the end of the set.
A RelativeLocationPath as specified above. It's executed for each element in the selected set and those where it yields an empty result are filtered out while the rest remain in the set.
A RelativeLocationPath as specified above followed by ="value". The path is executed for each element in the selected set and those where the text result of it is equal to the given value remain in the set. Either ' or " may be used to delimit the value.
If xml_format is nonzero, the return value is an xml formatted string of all the matched nodes, in document order. Otherwise the return value is as follows:
Attributes are returned as one or more index/value pairs in a mapping. Other nodes are returned as the node objects. If the expression is on a form that can give at most one answer (i.e. there's a predicate with an integer index) then a single mapping or node is returned, or zero if there was no match. If the expression can give more answers then the return value is an array containing zero or more attribute mappings and/or nodes. The array follows document order.
Not DOM compliant.
string xml_format()
Returns the formatted XML that corresponds to the node tree.
Not DOM compliant.
CLASS Parser.XML.SloppyDOM.NodeWithChildElements |
Node with child elements.
inherit NodeWithChildren : NodeWithChildren
array(Element) get_elements(string name)
Lightweight variant of get_elements_by_tag_name that returns a simple array instead of a fancy live NodeList.
Not DOM compliant.
array(Element) get_descendant_elements()
Returns all descendant elements in document order.
Not DOM compliant.
array(Node) get_descendant_nodes()
Returns all descendant nodes (except attribute nodes) in document order.
Not DOM compliant.
CLASS Parser.XML.SloppyDOM.Document |
The node tree is very likely a cyclic structure, so it might be an good idea to destruct it when you're finished with it, to avoid garbage. Destructing the Document object always destroys all nodes in it.
inherit NodeWithChildElements : NodeWithChildElements
array(Element) get_elements(string name)
Note that this one looks among the top level elements, as opposed to get_elements_by_tag_name . This means that if the document is correct, you can only look up the single top level element here.
Not DOM compliant.
int get_raw_values()
Not DOM compliant.
CLASS Parser.HTML |
This is a simple parser for SGML structured markups. It's not really HTML, but it's useful for that *! purpose.
The simple way to use it is to give it some information about available tags and containers, and what callbacks those is to call.
The object is easily reused, by calling the clone() function.
add_tag , add_container , finish
Parser.HTML _set_tag_callback(function|string|array to_call)
Parser.HTML _set_entity_callback(function|string|array to_call)
Parser.HTML _set_data_callback(function|string|array to_call)
These functions set up the parser object to call the given callbacks upon tags, entities and/or data. The callbacks will only be called if there isn't another tag/container/entity handler for these.
The callback function will be called with the parser object as first argument, and the active string as second. Note that no parsing of the contents has been done. Both endtags and normal tags are called; there is no container parsing.
The return values from the callbacks are handled in the same way as the return values from callbacks registered with add_tag and similar functions.
The data callback will be called as seldom as possible with the longest possible string, as long as it doesn't get called out of order with any other callback. It will never be called with a zero length string.
If a string or array is given instead of a function, it will act as the return value from the function. Arrays or empty strings is probably preferable to avoid recursion.
Returns the object being called.
Parser.HTML add_tag(string name, mixed to_do)
Parser.HTML add_container(string name, mixed to_do)
Parser.HTML add_entity(string entity, mixed to_do)
Parser.HTML add_quote_tag(string name, mixed to_do, string end)
Parser.HTML add_tags(mapping(string:mixed) tags)
Parser.HTML add_containers(mapping(string:mixed) containers)
Parser.HTML add_entities(mapping(string:mixed) entities)
Registers the actions to take when parsing various things. Tags, containers, entities are as usual. add_quote_tag() adds a special kind of tag that reads any data until the next occurrence of the end string immediately before a tag end.
This argument can be any of the following.
|
The callback function can return:
|
Returns the object being called.
tags , containers , entities
Parser.HTML clear_tags()
Parser.HTML clear_containers()
Parser.HTML clear_entities()
Parser.HTML clear_quote_tags()
Removes all registered definitions in the different categories.
Returns the object being called.
add_tag , add_tags , add_container , add_containers , add_entity , add_entities
mapping(string:mixed) tags()
mapping(string:mixed) containers()
mapping(string:mixed) entities()
Returns the current callback settings. When matching is done case insensitively, all names will be returned in lowercase.
Implementation note: These run in constant time since they return copy-on-write mappings.
add_tag , add_tags , add_container , add_containers , add_entity , add_entities
mapping(string:array(mixed|string)) quote_tags()
Returns the current callback settings. The values are arrays ({callback, end_quote}). When matching is done case insensitively, all names will be returned in lowercase.
Implementation note: quote_tags() allocates a new mapping for every call and thus, unlike e.g. tags() runs in linear time.
add_quote_tag
Parser.HTML feed()
Parser.HTML feed(string s, void|int do_parse)
Feed new data to the Parser.HTML object. This will start a scan and may result in callbacks. Note that it's possible that all data fed isn't processed - to do that, call finish() .
If the function is called without arguments, no data is fed, but
the parser is run. If the string argument is followed by a
0
, ->feed(s,0);
, the string is fed, but the parser
isn't run.
Returns the object being called.
finish , read , feed_insert
Parser.HTML feed_insert(string s)
This pushes a string on the parser stack.
Returns the object being called.
Don't use!
Parser.HTML finish()
Parser.HTML finish(string s)
Finish a parser pass. A string may be sent here, similar to feed().
Returns the object being called.
string|array(mixed) read()
string|array(mixed) read(int max_elems)
Read parsed data from the parser object.
Returns a string of parsed data if the parser isn't in mixed_mode , an array of arbitrary data otherwise.
Parser.HTML write_out(mixed ... args)
Send data to the output stream, i.e. it won't be parsed and it won't be sent to the data callback, if any.
Any data is allowed when the parser is running in mixed_mode . Only strings are allowed otherwise.
Returns the object being called.
array(int) at()
int at_line()
int at_char()
int at_column()
Returns the current position. Characters and columns count from
0
, lines count from 1
.
at() gives an array with the following layout.
|
string current()
Gives the current range of data, ie the whole tag/entity/etc being parsed in the current callback. Returns zero if there's no current range, i.e. when the function is not called in a callback.
string tag_name()
Gives the name of the current tag, or zero. If used from an entity callback, it gives the string inside the entity.
mapping(string:mixed) tag_args(void|mixed default_value)
Gives the arguments of the current tag, parsed to a convenient mapping consisting of key:value pairs. If the current thing isn't a tag, it gives zero. default_value is used for arguments which have no value in the tag. If default_value isn't given, the value is set to the same string as the key.
string tag_content()
Gives the content of the current tag, if it's a container or quote tag. Otherwise returns zero.
array tag(void|mixed default_value)
Returns the equivalent of the following calls.
|
string context()
Returns the current output context as a string.
|
The return value can also be a single character string, in which case the context is a quoted argument. The string contains the starting quote character.
This function is typically only useful in entity callbacks, which can be called both from text and argument values of different sorts.
splice_arg
string parse_tag_name(string tag)
Parses the tag name from a tag string without the surrounding
brackets, i.e. a string on the form "tagname some='tag'
args"
.
Returns the tag name or an empty string if none.
mapping parse_tag_args(string tag)
Parses the tag arguments from a tag string without the name and
surrounding brackets, i.e. a string on the form "some='tag'
args"
.
Returns a mapping containing the tag arguments.
tag_args
mapping _inspect()
This is a low-level way of debugging a parser. This gives a mapping of the internal state of the Parser.HTML object.
The format and contents of this mapping may change without further notice.
Parser.HTML clone(mixed ... args)
Clones the Parser.HTML object. A new object of the same class is created, filled with the parse setup from the old object.
This is the simpliest way of flushing a parse feed/output.
The arguments to clone is sent to the new object, simplifying work for custom classes that inherits Parser.HTML .
Returns the new object.
create is called _before_ the setup is copied.
Parser.HTML set_extra(mixed ... args)
Sets the extra arguments passed to all tag, container and entity callbacks.
Returns the object being called.
array get_extra()
Gets the extra arguments set by set_extra() .
Returns the object being called.
string splice_arg(void|string name)
If given a string, it sets the splice argument name to it. It returns the old splice argument name.
If a splice argument name is set, it's parsed in all tags, both those with callbacks and those without. Wherever it occurs, its value (after being parsed for entities in the normal way) is inserted directly into the tag. E.g:
<foo arg1="val 1" splice="arg2='val 2' arg3" arg4>
becomes
<foo arg1="val 1" arg2='val 2' arg3 arg4>
if "splice"
is set as the splice argument name.
int case_insensitive_tag(void|int value)
All tags and containers are matched case insensitively, and argument names are converted to lowercase. Tags added with add_quote_tag() are not affected, though. Switching to case insensitive mode and back won't preserve the case of registered tags and containers.
int ignore_tags(void|int value)
Do not look for tags at all. Normally tags are matched even when
there's no callbacks for them at all. When this is set, the tag
delimiters '<'
and '>'
will be treated as any
normal character.
int ignore_unknown(void|int value)
Treat unknown tags and entities as text data, continuing parsing for tags and entities inside them.
When functions are specified with _set_tag_callback() or _set_entity_callback() , all tags or entities, respectively, are considered known. However, if one of those functions return 1 and ignore_unknown is set, they are treated as text data instead of making another call to the same function again.
int lazy_argument_end(void|int value)
A '>'
in a tag argument closes both the argument and the
tag, even if the argument is quoted.
int lazy_entity_end(void|int value)
Normally, the parser search indefinitely for the entity end
character (i.e. ';'
). When this flag is set, the
characters '&'
, '<'
, '>'
, '"'
,
'''
, and any whitespace breaks the search for the entity
end, and the entity text is then ignored, i.e. treated as
data.
int match_tag(void|int value)
Unquoted nested tag starters and enders will be balanced when parsing tags. This is the default.
int max_parse_depth(void|int value)
Maximum recursion depth during parsing. Recursion occurs when a
tag/container/entity/quote tag callback function returns a string
to be reparsed. The default value is 10
.
int mixed_mode(void|int value)
Allow callbacks to return arbitrary data in the arrays, which will be concatenated in the output.
int reparse_strings(void|int value)
When a plain string is used as a tag/container/entity/quote tag callback, it's not reparsed if this flag is unset. Setting it causes all such strings to be reparsed.
int ws_before_tag_name(void|int value)
Allow whitespace between the tag start character and the tag name.
int xml_tag_syntax(void|int value)
Whether or not to use XML syntax to tell empty tags and container tags apart.
|
int nestling_entity_end(void|int value)
int ignore_comments(void|int value)
Module Parser |
HTML get_xml_parser()
Returns a Parser.HTML initialized for parsing XML. It has all the flags set properly for XML syntax and callbacks to ignore comments, CDATA blocks and unknown PI tags, but it has no registered tags and doesn't decode any entities.
string decode_numeric_xml_entity(string chref)
Decodes the numeric XML entity chref , e.g. "4" and returns the character as a string. chref is the name part of the entity, i.e. without the leading '&' and trailing ';'. Returns zero if chref isn't on a recognized form or if the character number is too large to be represented in a string.
HTML html_entity_parser()
string parse_html_entities(string in)
HTML html_entity_parser(int noerror)
string parse_html_entities(string in, int noerror)
Parse any HTML entities in the string to unicode characters. Either return a complete parser (to build on or use) or parse a string. Throw an error if there is an unrecognized entity in the string if noerror is not set.
Currently using XHTML 1.0 tables.
string encode_html_entities(string raw)
Encode characters to HTML entities, e.g. turning "<"
into
"<"
.
The characters that will be encoded are characters <= 32,
"\"&'<>"
and characters >= 127 and <= 160 and characters
>= 255.
CLASS Parser.HTML |
This is a simple parser for SGML structured markups. It's not really HTML, but it's useful for that *! purpose.
The simple way to use it is to give it some information about available tags and containers, and what callbacks those is to call.
The object is easily reused, by calling the clone() function.
add_tag , add_container , finish
Parser.HTML _set_tag_callback(function|string|array to_call)
Parser.HTML _set_entity_callback(function|string|array to_call)
Parser.HTML _set_data_callback(function|string|array to_call)
These functions set up the parser object to call the given callbacks upon tags, entities and/or data. The callbacks will only be called if there isn't another tag/container/entity handler for these.
The callback function will be called with the parser object as first argument, and the active string as second. Note that no parsing of the contents has been done. Both endtags and normal tags are called; there is no container parsing.
The return values from the callbacks are handled in the same way as the return values from callbacks registered with add_tag and similar functions.
The data callback will be called as seldom as possible with the longest possible string, as long as it doesn't get called out of order with any other callback. It will never be called with a zero length string.
If a string or array is given instead of a function, it will act as the return value from the function. Arrays or empty strings is probably preferable to avoid recursion.
Returns the object being called.
Parser.HTML add_tag(string name, mixed to_do)
Parser.HTML add_container(string name, mixed to_do)
Parser.HTML add_entity(string entity, mixed to_do)
Parser.HTML add_quote_tag(string name, mixed to_do, string end)
Parser.HTML add_tags(mapping(string:mixed) tags)
Parser.HTML add_containers(mapping(string:mixed) containers)
Parser.HTML add_entities(mapping(string:mixed) entities)
Registers the actions to take when parsing various things. Tags, containers, entities are as usual. add_quote_tag() adds a special kind of tag that reads any data until the next occurrence of the end string immediately before a tag end.
This argument can be any of the following.
|
The callback function can return:
|
Returns the object being called.
tags , containers , entities
Parser.HTML clear_tags()
Parser.HTML clear_containers()
Parser.HTML clear_entities()
Parser.HTML clear_quote_tags()
Removes all registered definitions in the different categories.
Returns the object being called.
add_tag , add_tags , add_container , add_containers , add_entity , add_entities
mapping(string:mixed) tags()
mapping(string:mixed) containers()
mapping(string:mixed) entities()
Returns the current callback settings. When matching is done case insensitively, all names will be returned in lowercase.
Implementation note: These run in constant time since they return copy-on-write mappings.
add_tag , add_tags , add_container , add_containers , add_entity , add_entities
mapping(string:array(mixed|string)) quote_tags()
Returns the current callback settings. The values are arrays ({callback, end_quote}). When matching is done case insensitively, all names will be returned in lowercase.
Implementation note: quote_tags() allocates a new mapping for every call and thus, unlike e.g. tags() runs in linear time.
add_quote_tag
Parser.HTML feed()
Parser.HTML feed(string s, void|int do_parse)
Feed new data to the Parser.HTML object. This will start a scan and may result in callbacks. Note that it's possible that all data fed isn't processed - to do that, call finish() .
If the function is called without arguments, no data is fed, but
the parser is run. If the string argument is followed by a
0
, ->feed(s,0);
, the string is fed, but the parser
isn't run.
Returns the object being called.
finish , read , feed_insert
Parser.HTML feed_insert(string s)
This pushes a string on the parser stack.
Returns the object being called.
Don't use!
Parser.HTML finish()
Parser.HTML finish(string s)
Finish a parser pass. A string may be sent here, similar to feed().
Returns the object being called.
string|array(mixed) read()
string|array(mixed) read(int max_elems)
Read parsed data from the parser object.
Returns a string of parsed data if the parser isn't in mixed_mode , an array of arbitrary data otherwise.
Parser.HTML write_out(mixed ... args)
Send data to the output stream, i.e. it won't be parsed and it won't be sent to the data callback, if any.
Any data is allowed when the parser is running in mixed_mode . Only strings are allowed otherwise.
Returns the object being called.
array(int) at()
int at_line()
int at_char()
int at_column()
Returns the current position. Characters and columns count from
0
, lines count from 1
.
at() gives an array with the following layout.
|
string current()
Gives the current range of data, ie the whole tag/entity/etc being parsed in the current callback. Returns zero if there's no current range, i.e. when the function is not called in a callback.
string tag_name()
Gives the name of the current tag, or zero. If used from an entity callback, it gives the string inside the entity.
mapping(string:mixed) tag_args(void|mixed default_value)
Gives the arguments of the current tag, parsed to a convenient mapping consisting of key:value pairs. If the current thing isn't a tag, it gives zero. default_value is used for arguments which have no value in the tag. If default_value isn't given, the value is set to the same string as the key.
string tag_content()
Gives the content of the current tag, if it's a container or quote tag. Otherwise returns zero.
array tag(void|mixed default_value)
Returns the equivalent of the following calls.
|
string context()
Returns the current output context as a string.
|
The return value can also be a single character string, in which case the context is a quoted argument. The string contains the starting quote character.
This function is typically only useful in entity callbacks, which can be called both from text and argument values of different sorts.
splice_arg
string parse_tag_name(string tag)
Parses the tag name from a tag string without the surrounding
brackets, i.e. a string on the form "tagname some='tag'
args"
.
Returns the tag name or an empty string if none.
mapping parse_tag_args(string tag)
Parses the tag arguments from a tag string without the name and
surrounding brackets, i.e. a string on the form "some='tag'
args"
.
Returns a mapping containing the tag arguments.
tag_args
mapping _inspect()
This is a low-level way of debugging a parser. This gives a mapping of the internal state of the Parser.HTML object.
The format and contents of this mapping may change without further notice.
Parser.HTML clone(mixed ... args)
Clones the Parser.HTML object. A new object of the same class is created, filled with the parse setup from the old object.
This is the simpliest way of flushing a parse feed/output.
The arguments to clone is sent to the new object, simplifying work for custom classes that inherits Parser.HTML .
Returns the new object.
create is called _before_ the setup is copied.
Parser.HTML set_extra(mixed ... args)
Sets the extra arguments passed to all tag, container and entity callbacks.
Returns the object being called.
array get_extra()
Gets the extra arguments set by set_extra() .
Returns the object being called.
string splice_arg(void|string name)
If given a string, it sets the splice argument name to it. It returns the old splice argument name.
If a splice argument name is set, it's parsed in all tags, both those with callbacks and those without. Wherever it occurs, its value (after being parsed for entities in the normal way) is inserted directly into the tag. E.g:
<foo arg1="val 1" splice="arg2='val 2' arg3" arg4>
becomes
<foo arg1="val 1" arg2='val 2' arg3 arg4>
if "splice"
is set as the splice argument name.
int case_insensitive_tag(void|int value)
All tags and containers are matched case insensitively, and argument names are converted to lowercase. Tags added with add_quote_tag() are not affected, though. Switching to case insensitive mode and back won't preserve the case of registered tags and containers.
int ignore_tags(void|int value)
Do not look for tags at all. Normally tags are matched even when
there's no callbacks for them at all. When this is set, the tag
delimiters '<'
and '>'
will be treated as any
normal character.
int ignore_unknown(void|int value)
Treat unknown tags and entities as text data, continuing parsing for tags and entities inside them.
When functions are specified with _set_tag_callback() or _set_entity_callback() , all tags or entities, respectively, are considered known. However, if one of those functions return 1 and ignore_unknown is set, they are treated as text data instead of making another call to the same function again.
int lazy_argument_end(void|int value)
A '>'
in a tag argument closes both the argument and the
tag, even if the argument is quoted.
int lazy_entity_end(void|int value)
Normally, the parser search indefinitely for the entity end
character (i.e. ';'
). When this flag is set, the
characters '&'
, '<'
, '>'
, '"'
,
'''
, and any whitespace breaks the search for the entity
end, and the entity text is then ignored, i.e. treated as
data.
int match_tag(void|int value)
Unquoted nested tag starters and enders will be balanced when parsing tags. This is the default.
int max_parse_depth(void|int value)
Maximum recursion depth during parsing. Recursion occurs when a
tag/container/entity/quote tag callback function returns a string
to be reparsed. The default value is 10
.
int mixed_mode(void|int value)
Allow callbacks to return arbitrary data in the arrays, which will be concatenated in the output.
int reparse_strings(void|int value)
When a plain string is used as a tag/container/entity/quote tag callback, it's not reparsed if this flag is unset. Setting it causes all such strings to be reparsed.
int ws_before_tag_name(void|int value)
Allow whitespace between the tag start character and the tag name.
int xml_tag_syntax(void|int value)
Whether or not to use XML syntax to tell empty tags and container tags apart.
|
int nestling_entity_end(void|int value)
int ignore_comments(void|int value)
CLASS Parser.SGML |
This is a handy simple parser of SGML-like syntax like HTML. It doesn't do anything advanced, but finding the corresponding end-tags.
It's used like this:
array res=Parser.SGML()->feed(string)->finish()->result();
The resulting structure is an array of atoms, where the atom can be a string or a tag. A tag contains a similar array, as data.
A string
"<gat> <gurka> </gurka> <banan> <kiwi> </gat>"
results in
({ tag "gat" object with data: ({ tag "gurka" object with data: ({ " " }) tag "banan" object with data: ({ " " tag "kiwi" object with data: ({ " " }) }) }) })
ie, simple "tags" (not containers) are not detected, but containers are ended implicitely by a surrounding container _with_ an end tag.
The 'tag' is an object with the following variables:
string name; - name of tag mapping args; - argument to tag int line,char,column; - position of tag string file; - filename (see <ref>create</ref>) array(SGMLatom) data; - contained data
string Parser.SGML.file
void Parser.SGML()
void Parser.SGML(string filename)
This object is created with this filename. It's passed to all created tags, for debug and trace purposes.
No, it doesn't read the file itself. See feed() .
object feed(string s)
array(SGMLatom|string) finish()
array(SGMLatom|string) result(string s)
Feed new data to the object, or finish the stream. No result can be used until finish() is called.
Both finish() and result() return the computed data.
feed() returns the called object.
CLASS Parser.SGML.SGMLatom |
string Parser.SGML.SGMLatom.name
mapping Parser.SGML.SGMLatom.args
int Parser.SGML.SGMLatom.line
int Parser.SGML.SGMLatom.char
int Parser.SGML.SGMLatom.column
string Parser.SGML.SGMLatom.file
array(SGMLatom) Parser.SGML.SGMLatom.data
CLASS Parser.RCS |
A RCS file parser that eats a RCS *,v file and presents nice pike data structures of its contents.
inherit Parser._RCS : _RCS
string Parser.RCS.head
Version number of the head version of the file.
string|int(0..0) Parser.RCS.branch
The default branch (or revision), if present, 0
otherwise.
array(string) Parser.RCS.access
The usernames listed in the ACCESS section of the RCS file.
string|int(0..0) Parser.RCS.comment
The RCS file comment if present, 0
otherwise.
string Parser.RCS.expand
The keyword expansion options (as named by RCS) if present,
0
otherwise.
string Parser.RCS.description
The RCS file description.
mapping(string:string) Parser.RCS.locks
Maps from username to revision for users that have acquired locks on this file.
int(0..1) Parser.RCS.strict_locks
1
if strict locking is set, 0
otherwise.
mapping(string:string) Parser.RCS.tags
Maps tag names (indices) to tagged revision numbers (values).
This mapping typically contains raw revision numbers for branches
(ie "1.1.0.2"
and not "1.1.2"
).
mapping(string:string) Parser.RCS.branches
Maps branch numbers (indices) to branch names (values).
The indices are short branch revision numbers (ie "1.1.2"
and not "1.1.0.2"
).
mapping(string:Revision) Parser.RCS.revisions
Data for all revisions of the file. The indices of the mapping are the revision numbers, whereas the values are the data from the corresponding revision.
array(Revision) Parser.RCS.trunk
Data for all revisions on the trunk, sorted in the same order as the RCS file stored them - ie descending, most recent first, I'd assume (rcsfile(5), of course, fails to state such irrelevant information).
string Parser.RCS.rcs_file_name
The filename of the RCS file as sent to create() .
void Parser.RCS(string|void file_name, string|int(0..0)|void file_contents)
Initializes the RCS object.
The path to the raw RCS file (includes trailing ",v"). Used mainly for error reporting (truncated RCS file or similar). Stored in rcs_file_name .
If a string is provided, that string will be parsed to
initialize the RCS object. If a zero (0
) is sent, no
initialization will be performed at all. If no value is given at
all, but file_name was provided, that file will be loaded and
parsed for object initialization.
array parse_admin_section(string|array raw)
Lower-level API function for parsing only the admin section (the initial chunk of an RCS file, see manpage rcsfile(5)) of an RCS file. After running parse_admin_section , the RCS object will be initialized with the values for head , branch , access , branches , tokenize , tags , locks , strict_locks , comment and expand .
The tokenized RCS file, or the raw RCS-file data.
The rest of the RCS file, admin section removed.
parse_delta_sections , parse_deltatext_sections , parse , create
Does not handle rcsfile(5) newphrase skipping.
array parse_delta_sections(array raw)
Lower-level API function for parsing only the delta sections (the second chunk of an RCS file, see manpage rcsfile(5)) of an RCS file. After running parse_delta_sections , the RCS object will be initialized with the value of description and populated revisions mapping and trunk array. Their Revision members are however only populated with the members Revision->revision , Revision->branch , Revision->time , Revision->author , Revision->state , Revision->branches , Revision->rcs_next , Revision->ancestor and Revision->next .
The tokenized RCS file, with admin section removed. (See parse_admin_section .)
The rest of the RCS file, delta sections removed.
parse_admin_section , tokenize , parse_deltatext_sections , parse , create
Does not handle rcsfile(5) newphrase skipping.
array(array(string)) tokenize(string data)
Tokenize an RCS file into tokens suitable as argument to the various parse functions
The RCS file data
An array with arrays of tokens
void parse_deltatext_sections(array raw, void|function(string:void) progress_callback, array|void callback_args)
Lower-level API function for parsing only the deltatext sections (the final and typically largest chunk of an RCS file, see manpage rcsfile(5)) of an RCS file. After a parse_deltatext_sections run, the RCS object will be fully populated.
The tokenized RCS file, with admin and delta sections removed. (See parse_admin_section , tokenize and parse_delta_sections .)
This optional callback is invoked with the revision of the deltatext about to be parsed (useful for progress indicators).
Optional extra trailing arguments to be sent to progress_callback
parse_admin_section , parse_delta_sections , parse , create
Does not handle rcsfile(5) newphrase skipping.
this_program parse(array raw, void|function(string:void) progress_callback)
Parse the RCS file raw and initialize all members of this object fully initialized.
The unprocessed RCS file.
Passed on to parse_deltatext_sections .
The fully initialized object (only returned for API convenience; the object itself is destructively modified to match the data extracted from raw )
parse_admin_section , parse_delta_sections , parse_deltatext_sections , create
string get_contents_for_revision(string|Revision rev, void|int(0..1) dont_cache_data)
Returns the file contents from the revision rev , without performing any keyword expansion. If dont_cache_data is set we will not keep intermediate revisions in memory unless they already existed. This will cut down memory use at the expense of slow access to older revisions.
expand_keywords_for_revision()
string expand_keywords_for_revision(string|Revision rev, string|void text, int|void expansion_mode)
Expand keywords and return the resulting text according to the expansion rules set for the file.
The revision to apply the expansion for.
If supplied, substitute keywords for that text instead using values that would apply for the given revision. Otherwise, revision rev is used.
Expansion mode
|
The Log keyword (which lacks sane quoting rules) is not expanded. Keyword expansion rules set in CVSROOT/cvswrappers are ignored. Only implements the -kkv, -ko and -kb expansion modes.
Does not perform any line-ending conversion.
get_contents_for_revision
CLASS Parser.RCS.DeltatextIterator |
Iterator for the deltatext sections of the RCS file. Typical usage:
string raw = Stdio.read_file(my_rcs_filename); Parser.RCS rcs = Parser.RCS(my_rcs_filename, 0); raw = rcs->parse_delta_sections(rcs->parse_admin_section(raw)); foreach(rcs->DeltatextIterator(raw); int n; Parser.RCS.Revision rev) do_something(rev);
void Parser.RCS.DeltatextIterator(array deltatext_section, void|function(string:void) progress_callback, void|array(mixed) progress_callback_args)
the deltatext section of the RCS file in its entirety
This optional callback is invoked with the revision of the deltatext about to be parsed (useful for progress indicators).
Optional extra trailing arguments to be sent to progress_callback
the rcsfile(5) manpage outlines the sections of an RCS file
int Parser.RCS.DeltatextIterator.nint(0..1) read_next()
Drops the leading whitespace before next revision's deltatext entry and sets this_rev to the revision number we're about to read.
int index()
the number of deltatext entries processed so far (0..N-1, N being the total number of revisions in the rcs file)
Revision value()
the Revision at whose deltatext data we are, updated with its info
int(0..1) `!()
1 if the iterator has processed all deltatext entries, 0 otherwise.
this_program `+=(int nsteps)
Advance nsteps sections.
Returns the iterator object.
int(0..1) next()
like `+= (1)
, but returns 0 if the iterator is finished
int(0..1) first()
Restart not implemented; always returns 0 (==failed)
int parse_deltatext_section(array raw, int o)
Chops off the first deltatext section from the token array raw and
returns the rest of the string, or the value 0
(zero) if
we had already visited the final deltatext entry. The deltatext's
data is stored destructively in the appropriate entry of the
revisions array.
raw +o must start with a deltatext entry for this method to work
does not handle rcsfile(5) newphrase skipping
if the rcs file is truncated, this method writes a descriptive error to stderr and then returns 0 - some nicer error handling wouldn't hurt
CLASS Parser.RCS.Revision |
All data tied to a particular revision of the file.
string Parser.RCS.Revision.revision
The revision number (i e
rcs_file->revisions["1.1"]->revision == "1.1"
).
string Parser.RCS.Revision.author
The userid of the user that committed the revision.
array(string) Parser.RCS.Revision.branches
When there are branches from this revision, an array with the
first revision number for each of the branches, otherwise 0
.
Follow the next fields to get to the branch head.
string Parser.RCS.Revision.state
The state of the revision - typically "Exp"
or "dead"
.
Calendar.TimeRange Parser.RCS.Revision.time
The (UTC) date and time when the revision was committed (second precision).
string Parser.RCS.Revision.branch
The branch name on which this revision was committed (calculated according to how cvs manages branches).
string Parser.RCS.Revision.rcs_next
The revision stored next in the RCS file, or 0
if none exists.
This field is straight from the RCS file, and has somewhat weird semantics. Usually you will want to use one of the derived fields next or prev or possibly rcs_prev .
next , prev , rcs_prev
string Parser.RCS.Revision.rcs_prev
The revision that this revision is based on,
or 0
if it is the HEAD.
This is the reverse pointer of rcs_next and branches , and is used by get_contents_for_revision() when applying the deltas to set text .
rcs_next
string Parser.RCS.Revision.ancestor
The revision of the ancestor of this revision, or 0
if this was
the initial revision.
next
string Parser.RCS.Revision.next
The revision that succeeds this revision, or 0
if none exists
(ie if this is the HEAD of the trunk or of a branch).
ancestor
string Parser.RCS.Revision.log
The log message associated with the revision.
int Parser.RCS.Revision.lines
The number of lines this revision contained, altogether (not of particular interest for binary files).
added , removed
int Parser.RCS.Revision.added
The number of lines that were added from the previous revision to make this revision (for the initial revision too).
lines , removed
int Parser.RCS.Revision.removed
The number of lines that were removed from the previous revision to make this revision.
lines , added
string Parser.RCS.Revision.rcs_text
The raw delta as stored in the RCS file.
text , get_contents_for_revision()
string Parser.RCS.Revision.text
The text as committed or 0
if
get_contents_for_revision() hasn't been called for this revision
yet.
Typically you don't access this field directly, but use get_contents_for_revision() to retrieve it.
get_contents_for_revision() , rcs_text
CLASS Parser.Tabular |
This is a parser for line and block oriented data. It provides a flexible yet concise record-description language to parse character/column/delimiter-organised records.
Parser.LR
void Parser.Tabular(void|string|Stdio.File|Stdio.FILE input, void|array|mapping|string|Stdio.File|Stdio.FILE format, void|int verbose)
This function initialises the parser.
The input stream or string.
The format to be used (either precompiled or not). The format description language is documented under compile() .
If >1
, it specifies the number of characters to display
of the beginning of each record as a progress indicator. Special
values are:
|
compile() , setformat() , fetch()
int skipemptylines()
This function can be used to manually skip empty lines in the input. This is unnecessary if no argument is specified for fetch() .
It returns true if EOF has been reached.
fetch()
mapping fetch(void|array|mapping format)
This function consumes as much input as needed to parse the full tabular structures at once.
Describes (precompiled only) formats to be parsed. If no format is specified, the format specified on create() is used, and empty lines are automatically skipped.
A nested mapping that contains the complete structure as described in the specified format.
If nothing matches the specified format, no input is consumed (except empty lines, if the default format is used), and zero is returned.
compile() , create() , setformat() , skipemptylines()
object feed(string content)
Is injected into the input stream.
This object.
fetch()
array|mapping setformat(array|mapping format)
Replaces the default (precompiled only) format.
The previous default format.
compile() , fetch()
array|mapping compile(string|Stdio.File|Stdio.FILE input)
Compiles the format description language into a compiled structure that can be fed to setformat , fetch , or create .
The format description is case sensitive.
The format description starts with a single line containing:
[Tabular description begin]
The format description ends with a single line containing:
[Tabular description end]
Any lines before the startline are skipped.
Any lines after the endline are not consumed.
Empty lines are skipped.
Comments start after a #
or ;
.
The depth level of a field is indicated by the number of leading spaces or colons at the beginning of the line.
The fieldname must not contain any whitespace.
An arbitrary number of single character field delimiters can be
specified between brackets, e.g. [,;]
or [,]
would be
for CSV.
When field delimiters are being used: in case of CSV type delimiters
[\t,; ]
the standard CSV quoting rules apply, in case other
delimiters
are used, no quoting is supported and the last field on a line should
not specify a delimiter, but should specify a 0 fieldwidth instead.
A fixed field width can be specified by a plain decimal integer, a value of 0 indicates a field with arbitrary length that extends till the end of the line.
A matching regular expression can be enclosed in ""
, it has
to match
the complete field content and uses Regexp.SimpleRegexp syntax.
On records the following options are supported:
This record is required.
Fold this record's contents in the enclosing record.
This record is present at most once.
On fields the following options are supported:
After reading and matching this field, drop the field content from the resulting mappingstructure.
setformat() , create() , fetch()
Example of the description language:
[Tabular description begin] csv :gtz ::mybankno [,] ::transferdate [,] ::mutatiesoort [,] ::volgnummer [,] ::bankno [,] ::name [,] ::kostenplaats [,] drop ::amount [,] ::afbij [,] ::mutatie [,] ::reference [,] ::valutacode [,] mt940 :messageheader1 mandatory ::exporttime "0000" drop ::CS1 " " drop ::exportday "01" drop ::exportaddress 12 ::exportnumber 5 "[0-9]+" :messageheader3 mandatory fold single ::messagetype "940" drop ::CS1 " " drop ::messagepriority "00" drop :TRN fold ::tag ":20:" drop ::reference "GTZPB|MPBZ|INGEB" :accountid fold ::tag ":25:" drop ::accountno 10 :statementno fold ::tag ":28C:" drop ::settlementno 0 drop :openingbalance mandatory single ::tag ":60F:" drop ::creditdebit 1 ::date 6 ::currency "EUR" ::amount 0 "[0-9]+,[0-9][0-9]" :statements ::statementline mandatory fold single :::tag ":61:" drop :::valuedate 6 :::creditdebit 1 :::amount "[0-9]+,[0-9][0-9]" :::CS1 "N" drop :::transactiontype 3 # 3 for Postbank, 4 for ING :::paymentreference 0 ::informationtoaccountowner fold single :::tag ":86:" drop :::accountno "[0-9]*( |)" :::accountname 0 ::description fold :::description 0 "|[^:].*" :closingbalance mandatory single ::tag ":62[FM]:" drop ::creditdebit 1 ::date 6 ::currency "EUR" ::amount 0 "[0-9]+,[0-9][0-9]" :informationtoaccountowner fold single ::tag ":86:" drop ::debit "D" drop ::debitentries 6 ::credit "C" drop ::creditentries 6 ::debit "D" drop ::debitamount "[0-9]+,[0-9][0-9]" ::credit "C" drop ::creditamount "[0-9]+,[0-9][0-9]" drop ::accountname "(\n[^-:][^\n]*)*" drop :messagetrailer mandatory single ::start "-" ::end "XXX" [Tabular description end]
Module Parser._parser |
Low-level helpers for parsers.
You probably don't want to use the modules contained in this module directly, but instead use the other Parser modules. See instead the modules below.
Parser , Parser.C , Parser.Pike , Parser.RCS , Parser.HTML , Parser.XML
Module Parser._parser._Pike |
Low-level helpers for Parser.Pike .
You probably want to use Parser.Pike instead of this module.
Parser.Pike , _C .
array(array(string)|string) tokenize(string code)
Tokenize a string of Pike tokens.
Returns an array with Pike-level tokens and the remainder (a partial token), if any.
Module Parser._parser._RCS |
Low-level helpers for Parser.RCS .
You probably want to use Parser.RCS instead of this module.
Parser.RCS
array(array(string)) tokenize(string code)
Tokenize a string of RCS tokens.
Don't use this function directly. Use Parser.RCS.tokenize() instead.
Parser.RCS.tokenize()
Module Parser._parser._C |
Low-level helpers for Parser.C .
You probably want to use Parser.C instead of this module.
Parser.C , _Pike .
array(array(string)|string) tokenize(string code)
Tokenize a string of C tokens.
Don't use this function directly. Use Parser.C.tokenize() instead.
Returns an array with an array with C-level tokens, and the remainder (a partial token), if any.
Module Parser.XML |
string autoconvert(string xml)
CLASS Parser.XML.Simple |
array parse(string xml, string context, function cb, mixed ... extra_args)
array parse(string xml, function cb, mixed ... extra_args)
The context argument was introduced in Pike 7.8.
mixed parse_dtd(string dtd, string context, function cb, mixed ... extras)
mixed parse_dtd(string dtd, function cb, mixed ... extras)
The context argument was introduced in Pike 7.8.
string lookup_entity(string entity)
Returns the verbatim expansion of the entity.
Added in Pike 7.7.
void define_entity_raw(string entity, string raw)
Define an entity or an SMEG.
Entity name, or SMEG name (if preceeded by a "%"
).
Verbatim expansion of the entity.
define_entity()
void define_entity(string entity, string s, function cb, mixed ... extras)
Define an entity or an SMEG.
Entity name, or SMEG name (if preceeded by a "%"
).
Expansion of the entity. Entity evaluation will be performed.
define_entity_raw()
void allow_rxml_entities(int(0..1) yes_no)
void compat_allow_errors(string version)
Set whether the parser should allow certain errors for compatibility with earlier versions. version can be:
|
version can also be zero to enable all error checks.
CLASS Parser.XML.Simple.Context |
mixed parse_xml()
mixed parse_dtd()
string parse_entity()
void push_string(string s)
void push_string(string s, string context)
Add a string to parse at the current position.
String to insert at the current parsing position.
Optional context used to refer to the inserted string.
This is typically an URL, but may also be an entity
(preceeded by an "&"
) or a SMEG reference
(preceeded by a "%"
).
Not used by the XML parser as such, but is simply
passed into the callbackinfo mapping as
the field "context"
where it can be useful
for eg resolving relative URLs when parsing DTDs,
or for determining where errors occur.
The context argument was introduced in Pike 7.8.
void Parser.XML.Simple.Context(string s, string context, int flags, function cb, mixed ... extra_args)
void Parser.XML.Simple.Context(string s, int flags, function cb, mixed ... extra_args)
These two arguments are passed along to push_string() .
Parser flags.
Callback function. This function gets called at various stages during the parsing.
The context argument was introduced in Pike 7.8.
CLASS Parser.XML.Validating |
Validating XML parser.
Validates an XML file according to a DTD.
cf http://wwww.w3.org/TR/REC-xml/
$Id: 74718ccb3f8dc66d39c8bcca45ca590427558c6a $
inherit .Simple : Simple
Extends the Simple XML parser.
int isname(string s)
Check if s is a valid Name.
int isnmtoken(string s)
Check if s is a valid Nmtoken.
int isnames(string s)
Check if s is a valid list of Names.
int isnmtokens(string s)
Check if s is a valid list of Nmtokens.
string get_external_entity(string sysid, string|void pubid, mapping|__deprecated__(int)|void info, mixed ... extra)
Get an external entity.
Called when a <!DOCTYPE> with a SYSTEM identifier is encountered, or when an entity reference needs expanding.
The SYSTEM identifier.
The PUBLIC identifier (if any).
The callbackinfo mapping containing the current parser state.
The extra arguments as passed to parse() or parse_dtd() .
Returns a string with a DTD fragment on success.
Returns 0
(zero) on failure.
Returning zero will cause the validator to report an error.
In Pike 7.7 and earlier info had the value 0
(zero).
The default implementation always returns 0
(zero).
Override this function to provide other behaviour.
parse() , parse_dtd()
mixed validate(string kind, string name, mapping attributes, array|string contents, mapping(string:mixed) info, function(string:mixed) callback, array(mixed) extra)
The validation callback function.
::parse()
array parse(string data, string|function(string:mixed) callback, mixed ... extra)
Document this function
array parse_dtd(string data, string|function(string:mixed) callback, mixed ... extra)
Document this function
CLASS Parser.XML.Validating.Element |
XML Element node.
Module Parser.XML.Tree |
XML parser that generates node-trees.
Has some support for XML namespaces http://www.w3.org/TR/REC-xml-names/ RFC 2518 23.4.
This module defines two sets of node trees; the SimpleNode -based, and the Node -based. The main difference between the two, is that the Node -based trees have parent pointers, which tend to generate circular data references and thus garbage.
There are some more subtle differences between the two. Please read the documentation carefully.
constant Parser.XML.Tree.STOP_WALK
constant Parser.XML.Tree.XML_ROOT
constant Parser.XML.Tree.XML_ELEMENT
constant Parser.XML.Tree.XML_TEXT
constant Parser.XML.Tree.XML_HEADER
constant Parser.XML.Tree.XML_PI
constant Parser.XML.Tree.XML_COMMENT
constant Parser.XML.Tree.XML_DOCTYPE
constant Parser.XML.Tree.XML_ATTR
Attribute nodes are created on demand
constant Parser.XML.Tree.DTD_ENTITY
constant Parser.XML.Tree.DTD_ELEMENT
constant Parser.XML.Tree.DTD_ATTLIST
constant Parser.XML.Tree.DTD_NOTATION
constant Parser.XML.Tree.XML_NODE
string text_quote(string data)
Quotes the string given in data by escaping &, < and >.
string roxen_text_quote(string data)
Quotes strings just like text_quote , but entities in the form &foo.bar; will not be quoted.
string attribute_quote(string data)
Quotes the string given in data by escaping &, <, >, ' and ".
string roxen_attribute_quote(string data)
Quotes strings just like attribute_quote , but entities in the form &foo.bar; will not be quoted.
SimpleRootNode simple_parse_input(string data, void|mapping predefined_entities, ParseFlags|void flags, string|void default_namespace)
Takes an XML string and produces a SimpleNode tree.
SimpleRootNode simple_parse_file(string path, void|mapping predefined_entities, ParseFlags|void flags, string|void default_namespace)
Loads the XML file path , creates a SimpleNode tree representation and returns the root node.
RootNode parse_input(string data, void|int(0..1) no_fallback, void|int(0..1) force_lowercase, void|mapping(string:string) predefined_entities, void|int(0..1) parse_namespaces, ParseFlags|void flags)
Takes an XML string and produces a node tree.
flags is not used for PARSE_WANT_ERROR_CONTEXT , PARSE_FORCE_LOWERCASE or PARSE_ENABLE_NAMESPACES since they are covered by the separate flag arguments.
Node parse_file(string path, int(0..1)|void parse_namespaces)
Loads the XML file path , creates a node tree representation and returns the root node.
ENUM Parser.XML.Tree.ParseFlags |
Flags used together with simple_parse_input() and simple_parse_file() .
CLASS Parser.XML.Tree.XMLNSParser |
Namespace aware parser.
mapping(string:string) Enter(mapping(string:string) attrs)
Check attrs for namespaces.
Returns the namespace expanded version of attrs .
CLASS Parser.XML.Tree.AbstractSimpleNode |
Base class for nodes.
array(AbstractSimpleNode) get_children()
Returns all the nodes children.
int count_children()
Returns the number of children of the node.
AbstractSimpleNode low_clone()
Returns an initialized copy of the node.
The returned node has no children.
AbstractSimpleNode clone()
Returns a clone of the sub-tree rooted in the node.
AbstractSimpleNode get_last_child()
Returns the last child node or zero.
AbstractSimpleNode `[](int pos)
The [] operator indexes among the node children, so
node[0]
returns the first node and node[-1]
the last.
The [] operator will select a node from all the nodes children, not just its element children.
AbstractSimpleNode add_child(AbstractSimpleNode c)
Adds the given node to the list of children of this node. The new node is added last in the list.
The return value differs from the one returned by Node()->add_child() .
The current node.
AbstractSimpleNode add_child_before(AbstractSimpleNode c, AbstractSimpleNode old)
Adds the node c to the list of children of this node. The node is added before the node old , which is assumed to be an existing child of this node. The node is added last if old is zero.
The current node.
AbstractSimpleNode add_child_after(AbstractSimpleNode c, AbstractSimpleNode old)
Adds the node c to the list of children of this node. The node is added after the node old , which is assumed to be an existing child of this node. The node is added first if old is zero.
The current node.
void remove_child(AbstractSimpleNode c)
Removes all occurrences of the provided node from the list of children of this node.
void replace_children(array(AbstractSimpleNode) children)
Replaces the nodes children with the provided ones.
AbstractSimpleNode replace_child(AbstractSimpleNode old, AbstractSimpleNode new)
Replaces the first occurrence of the old node child with the new node child.
The return value differs from the one returned by Node()->replace_child() .
Returns the current node on success, and 0
(zero)
if the node old wasn't found.
void zap_tree()
Destruct the tree recursively. When the inheriting AbstractNode or Node is used, which have parent pointers, this function should be called for every tree that no longer is in use to avoid frequent garbage collector runs.
int walk_preorder(function(AbstractSimpleNode:int|void) callback, mixed ... args)
Traverse the node subtree in preorder, root node first, then subtrees from left to right, calling the callback function for every node. If the callback function returns STOP_WALK the traverse is promptly aborted and STOP_WALK is returned.
int walk_preorder_2(function(AbstractSimpleNode:int|void) cb_1, function(AbstractSimpleNode:int|void) cb_2, mixed ... args)
Traverse the node subtree in preorder, root node first, then subtrees from left to right. For each node we call cb_1 before iterating through children, and then cb_2 (which always gets called even if the walk is aborted earlier). If the callback function returns STOP_WALK the traverse decend is aborted and STOP_WALK is returned once all waiting cb_2 functions have been called.
int walk_inorder(function(AbstractSimpleNode:int|void) callback, mixed ... args)
Traverse the node subtree in inorder, left subtree first, then root node, and finally the remaining subtrees, calling the function callback for every node. If the function callback returns STOP_WALK the traverse is promptly aborted and STOP_WALK is returned.
int walk_postorder(function(AbstractSimpleNode:int|void) callback, mixed ... args)
Traverse the node subtree in postorder, first subtrees from left to right, then the root node, calling the function callback for every node. If the function callback returns STOP_WALK the traverse is promptly aborted and STOP_WALK is returned.
int iterate_children(function(AbstractSimpleNode:int|void) callback, mixed ... args)
Iterates over the nodes children from left to right, calling the function callback for every node. If the callback function returns STOP_WALK the iteration is promptly aborted and STOP_WALK is returned.
array(AbstractSimpleNode) get_descendants(int(0..1) include_self)
Returns a list of all descendants in document order. Includes this node if include_self is set.
CLASS Parser.XML.Tree.AbstractNode |
Base class for nodes with parent pointers.
inherit AbstractSimpleNode : AbstractSimpleNode
void set_parent(AbstractNode parent)
Sets the parent node to parent .
AbstractNode get_parent()
Returns the parent node.
AbstractNode low_clone()
Returns an initialized copy of the node.
The returned node has no children, and no parent.
AbstractNode clone(void|int(-1..1) direction)
Clones the node, optionally connected to parts of the tree. If direction is -1 the cloned nodes parent will be set, if direction is 1 the clone nodes childen will be set.
AbstractNode get_root()
Follows all parent pointers and returns the root node.
AbstractNode add_child(AbstractNode c)
Adds the node c to the list of children of this node. The node is added before the node old , which is assumed to be an existing child of this node. The node is added first if old is zero.
Returns the new child node, NOT the current node.
The new child node is returned.
AbstractNode add_child_before(AbstractNode c, AbstractNode old)
Adds the node c to the list of children of this node. The node is added before the node old , which is assumed to be an existing child of this node. The node is added last if old is zero.
The current node.
AbstractNode add_child_after(AbstractNode c, AbstractNode old)
Adds the node c to the list of children of this node. The node is added after the node old , which is assumed to be an existing child of this node. The node is added first if old is zero.
The current node.
AbstractNode tmp_add_child(AbstractNode c)
AbstractNode tmp_add_child_before(AbstractNode c, AbstractNode old)
AbstractNode tmp_add_child_after(AbstractNode c, AbstractNode old)
Variants of add_child , add_child_before and add_child_after that doesn't set the parent pointer in the newly added children.
This is useful while building a node tree, to get efficient refcount garbage collection if the build stops abruptly. fix_tree has to be called on the root node when the building is done.
void fix_tree()
Fix all parent pointers recursively in a tree that has been built with tmp_add_child .
void remove_child(AbstractNode c)
Removes all occurrences of the provided node from the called nodes list of children. The removed nodes parent reference is set to null.
void remove_node()
Removes this node from its parent. The parent reference is set to null.
void replace_children(array(AbstractNode) children)
Replaces the nodes children with the provided ones. All parent references are updated.
AbstractNode replace_child(AbstractNode old, AbstractNode new)
Replaces the first occurrence of the old node child with the new node child. All parent references are updated.
The returned value is NOT the current node.
Returns the new child node.
AbstractNode replace_node(AbstractNode new)
Replaces this node with the provided one.
Returns the new node.
array(AbstractNode) get_preceding_siblings()
Returns all preceding siblings, i.e. all siblings present before this node in the parents children list.
array(AbstractNode) get_following_siblings()
Returns all following siblings, i.e. all siblings present after this node in the parents children list.
array(AbstractNode) get_siblings()
Returns all siblings, including this node.
array(AbstractNode) get_ancestors(int(0..1) include_self)
Returns a list of all ancestors, with the top node last. The list will start with this node if include_self is set.
array(AbstractNode) get_preceding()
Returns all preceding nodes, excluding this nodes ancestors.
array(AbstractNode) get_following()
Returns all the nodes that follows after the current one.
CLASS Parser.XML.Tree.VirtualNode |
Node in XML tree
mapping(string:string) get_attributes()
Returns this nodes attributes, which can be altered destructivly to alter the nodes attributes.
mapping get_short_attributes()
Returns this nodes name-space adjusted attributes.
set_short_namespaces() must have been called before calling this function.
int get_node_type()
Returns the node type. See defined node type constants.
string get_text()
Returns text content in node.
int get_doc_order()
void set_doc_order(int o)
string get_tag_name()
Returns the name of the element node, or the nearest element above if an attribute node.
string get_any_name()
Return name of tag or name of attribute node.
void set_tag_name(string name)
Change the tag name destructively. Can only be used on element and processing-instruction nodes.
string get_namespace()
Return the (resolved) namespace for this node.
string get_full_name()
Return fully qualified name of the element node.
void Parser.XML.Tree.VirtualNode(int type, string name, mapping attr, string text)
string value_of_node()
If the node is an attribute node or a text node, its value is returned. Otherwise the child text nodes are concatenated and returned.
AbstractNode get_first_element(string|void name, int(0..1)|void full)
Returns the first element child to this node.
If provided, the first element child with that name is returned.
If specified, name matching will be done against the full name.
Returns the first matching node, and 0 if no such node was found.
array(AbstractNode) get_elements(string|void name, int(0..1)|void full)
Returns all element children to this node.
If provided, only elements with that name is returned.
If specified, name matching will be done against the full name.
Returns an array with matching nodes.
mixed cast(string to)
It is possible to cast a node to a string, which will return render_xml() for that node.
string render_xml(void|int(0..1) preserve_roxen_entities, void|mapping(string:string) namespace_lookup)
Creates an XML representation of the node sub tree. If the flag preserve_roxen_entities is set, entities on the form &foo.bar; will not be escaped.
Mapping from namespace prefix to namespace symbol prefix.
void render_to_file(Stdio.File f, void|int(0..1) preserve_roxen_entities)
Creates an XML representation fo the node sub tree and streams the output to the file f . If the flag preserve_roxen_entities is set, entities on the form &foo.bar; will not be escaped.
CLASS Parser.XML.Tree.SimpleNode |
XML node without parent pointers and attribute nodes.
inherit AbstractSimpleNode : AbstractSimpleNode
inherit VirtualNode : VirtualNode
CLASS Parser.XML.Tree.Node |
XML node with parent pointers.
inherit AbstractNode : AbstractNode
inherit VirtualNode : VirtualNode
string get_tag_name()
Returns the name of the element node, or the nearest element above if an attribute node.
string get_attr_name()
Returns the name of the attribute node.
array(Node) get_attribute_nodes()
Creates and returns an array of new nodes; they will not be added as proper children to the parent node, but the parent link in the nodes are set so that upwards traversal is made possible.
CLASS Parser.XML.Tree.XMLParser |
Mixin for parsing XML.
this_program node_factory(int type, string name, mapping attr, string text)
Factory for creating nodes.
Type of node to create. One of:
|
Name of the tag if applicable.
Attributes for the tag if applicable.
Contained text of the tab if any.
This function is called during parsning to create the various XML nodes.
Overload this function to provide application-specific XML nodes.
Returns a node object representing the XML tag,
or 0
(zero) if the subtree rooted in the
tag should be cut.
This function is not available in Pike 7.6 and earlier.
CLASS Parser.XML.Tree.SimpleRootNode |
The root node of an XML-tree consisting of SimpleNode s.
inherit SimpleNode : SimpleNode
inherit XMLParser : XMLParser
SimpleElementNode get_element_by_id(string id, int|void force)
Find the element with the specified id.
The XML id of the node to search for.
Force a regeneration of the id lookup cache. Needed the first time after the node tree has been modified by adding or removing element nodes, or by changing the id attribute of an element node.
Returns the element node with the specified id if any. Returns UNDEFINED otherwise.
flush_node_id_cache
void flush_node_id_cache()
Clears the node id cache built and used by get_element_by_id .
CLASS Parser.XML.Tree.RootNode |
The root node of an XML-tree consisting of Node s.
inherit Node : Node
inherit XMLParser : XMLParser
ElementNode get_element_by_id(string id, int|void force)
Find the element with the specified id.
The XML id of the node to search for.
Force a regeneration of the id lookup cache. Needed the first time after the node tree has been modified by adding or removing element nodes, or by changing the id attribute of an element node.
Returns the element node with the specified id if any. Returns UNDEFINED otherwise.
flush_node_id_cache
void flush_node_id_cache()
Clears the node id cache built and used by get_element_by_id .
Module Parser.XML.NSTree |
A namespace aware version of Parser.XML.Tree. This implementation does as little validation as possible, so e.g. you can call your namespace xmlfoo without complaints.
inherit Parser.XML.Tree : Tree
NSNode parse_input(string data, void|string default_ns)
Takes a XML string data and produces a namespace node tree. If default_ns is given, it will be used as the default namespace.
Throws an error when an error is encountered during XML parsing.
string visualize(Node n, void|string indent)
Makes a visualization of a node graph suitable for printing out on a terminal.
> object x = parse_input("<a><b><c/>d</b><b><e/><f>g</f></b></a>"); > write(visualize(x)); Node(ROOT) NSNode(ELEMENT,"a") NSNode(ELEMENT,"b") NSNode(ELEMENT,"c") NSNode(TEXT) NSNode(ELEMENT,"b") NSNode(ELEMENT,"e") NSNode(ELEMENT,"f") NSNode(TEXT) Result 1: 201
CLASS Parser.XML.NSTree.NSNode |
Namespace aware node.
inherit Node : Node
string get_ns()
Returns the namespace in which the current element is defined in.
string get_default_ns()
Returns the default namespace in the current scope.
mapping(string:string) get_defined_nss()
Returns a mapping with all the namespaces defined in the current scope, except the default namespace.
The returned mapping is the same as the one in the node, so destructive changes will affect the node.
mapping(string:string) get_ns_attributes(string namespace)
Returns the attributes in this node that is declared in the provided namespace.
mapping(string:mapping(string:string)) get_ns_attributes()
Returns all the attributes in all namespaces that is associated with this node.
The returned mapping is the same as the one in the node, so destructive changes will affect the node.
void add_namespace(string ns, void|string symbol, void|int(0..1) chain)
Adds a new namespace to this node. The preferred symbol to use to identify the namespace can be provided in the symbol argument. If chain is set, no attempts to overwrite an already defined namespace with the same identifier will be made.
mapping(string:string) diff_namespaces()
Returns the difference between this nodes and its parents namespaces.
string get_xml_name()
Returns the element name as it occurs in xml files. E.g. "zonk:name" for the element "name" defined in a namespace denoted with "zonk". It will look up a symbol for the namespace in the symbol tables for the node and its parents. If none is found a new label will be generated by hashing the namespace.
void remove_child(NSNode child)
The remove_child is a not updated to take care of name space issues. To properly remove all the parents name spaces from the chid, call remove_node in the child.
Module Parser.XML.SloppyDOM |
A somewhat DOM-like library that implements lazy generation of the node tree, i.e. it's generated from the data upon lookup. There's also a little bit of XPath evaluation to do queries on the node tree.
Implementation note: This is generally more pragmatic than Parser.XML.DOM , meaning it's not so pretty and compliant, but more efficient.
Implementation status: There's only enough implemented to parse a node tree from source and access it, i.e. modification functions aren't implemented. Data hiding stuff like NodeList and NamedNodeMap is not implemented, partly since it's cumbersome to meet the "live" requirement. Also, Parser.HTML is used in XML mode to parse the input. Thus it's too error tolerant to be XML compliant, and it currently doesn't handle DTD elements, like "<!DOCTYPE", or the XML declaration (i.e. "<?xml version='1.0'?>".
Document parse(string source, void|int raw_values)
Normally entities are decoded, and Node.xml_format will encode them again. If raw_values is nonzero then all text and attribute values are instead kept in their original form.
CLASS Parser.XML.SloppyDOM.Node |
Basic node.
string get_text_content()
If the raw_values flag is set in the owning document, the text is returned with entities and CDATA blocks intact.
parse
mapping(string:string)|Node|array(mapping(string:string)|Node)|string simple_path(string path, void|int xml_format)
Access a node or a set of nodes through an expression that is a subset of an XPath RelativeLocationPath in abbreviated form.
That means one or more Steps separated by "/" or "//". A Step consists of an AxisSpecifier followed by a NodeTest and then optionally by one or more Predicate's.
"/" before a Step causes it to be matched only against the immediate children of the node(s) selected by the previous Step. "//" before a Step causes it to be matched against any children in the tree below the node(s) selected by the previous Step. The initial selection before the first Step is this element.
The currently allowed AxisSpecifier NodeTest combinations are:
name to select all elements with the given name. The name can be "*" to select all.
@name to select all attributes with the given name. The name can be "*" to select all.
comment() to select all comments.
text() to select all text and CDATA blocks. Note that all entity references are also selected, under the assumption that they would expand to text only.
processing-instruction("name") to select all processing instructions with the given name. The name can be left out to select all. Either ' or " may be used to delimit the name. For compatibility, it can also occur without surrounding quotes.
node() to select all nodes, i.e. the whole content of an element node.
. to select the currently selected element itself.
A Predicate is on the form [PredicateExpr] where PredicateExpr currently can be in any of the following forms:
An integer indexes one item in the selected set, according to the document order. A negative index counts from the end of the set.
A RelativeLocationPath as specified above. It's executed for each element in the selected set and those where it yields an empty result are filtered out while the rest remain in the set.
A RelativeLocationPath as specified above followed by ="value". The path is executed for each element in the selected set and those where the text result of it is equal to the given value remain in the set. Either ' or " may be used to delimit the value.
If xml_format is nonzero, the return value is an xml formatted string of all the matched nodes, in document order. Otherwise the return value is as follows:
Attributes are returned as one or more index/value pairs in a mapping. Other nodes are returned as the node objects. If the expression is on a form that can give at most one answer (i.e. there's a predicate with an integer index) then a single mapping or node is returned, or zero if there was no match. If the expression can give more answers then the return value is an array containing zero or more attribute mappings and/or nodes. The array follows document order.
Not DOM compliant.
string xml_format()
Returns the formatted XML that corresponds to the node tree.
Not DOM compliant.
CLASS Parser.XML.SloppyDOM.NodeWithChildElements |
Node with child elements.
inherit NodeWithChildren : NodeWithChildren
array(Element) get_elements(string name)
Lightweight variant of get_elements_by_tag_name that returns a simple array instead of a fancy live NodeList.
Not DOM compliant.
array(Element) get_descendant_elements()
Returns all descendant elements in document order.
Not DOM compliant.
array(Node) get_descendant_nodes()
Returns all descendant nodes (except attribute nodes) in document order.
Not DOM compliant.
CLASS Parser.XML.SloppyDOM.Document |
The node tree is very likely a cyclic structure, so it might be an good idea to destruct it when you're finished with it, to avoid garbage. Destructing the Document object always destroys all nodes in it.
inherit NodeWithChildElements : NodeWithChildElements
array(Element) get_elements(string name)
Note that this one looks among the top level elements, as opposed to get_elements_by_tag_name . This means that if the document is correct, you can only look up the single top level element here.
Not DOM compliant.
int get_raw_values()
Not DOM compliant.
Module Parser.LR |
LALR(1) parser generator.
ENUM Parser.LR.SeverityLevel |
Severity level
CLASS Parser.LR.Priority |
Specifies the priority and associativity of a rule.
int Parser.LR.Priority.value
Priority value
int Parser.LR.Priority.assoc
Associativity
|
void Parser.LR.Priority(int p, int a)
Create a new priority object.
Priority.
Associativity.
CLASS Parser.LR.Rule |
This object is used to represent a BNF-rule in the LR parser.
int Parser.LR.Rule.nonterminal
Non-terminal this rule reduces to.
array(string|int) Parser.LR.Rule.symbols
The actual rule
function|string Parser.LR.Rule.action
Action to do when reducing this rule. function - call this function. string - call this function by name in the object given to the parser. The function is called with arguments corresponding to the values of the elements of the rule. The return value of the function will be the value of this non-terminal. The default rule is to return the first argument.
int Parser.LR.Rule.has_tokens
This rule contains tokens
int Parser.LR.Rule.num_nonnullables
This rule has this many non-nullable symbols at the moment.
int Parser.LR.Rule.number
Sequence number of this rule (used for conflict resolving) Also used to identify the rule.
Priority Parser.LR.Rule.pri
Priority and associativity of this rule.
void Parser.LR.Rule(int nt, array(string|int) r, function|string|void a)
Create a BNF rule.
The rule
rule : nonterminal ":" symbols ";" { add_rule };
might be created as
rule(4, ({ 9, ":", 5, ";" }), "add_rule");
where 4 corresponds to the nonterminal "rule", 9 to "nonterminal" and 5 to "symbols", and the function "add_rule" is too be called when this rule is reduced.
Non-terminal to reduce to.
Symbol sequence that reduces to nt.
Action to do when reducing according to this rule. function - Call this function. string - Call this function by name in the object given to the parser. The function is called with arguments corresponding to the values of the elements of the rule. The return value of the function will become the value of this non-terminal. The default rule is to return the first argument.
CLASS Parser.LR.ErrorHandler |
Class handling reporting of errors and warnings.
optional int(-1..1) Parser.LR.ErrorHandler.verbose
Verbosity level
|
void Parser.LR.ErrorHandler(int(-1..1)|void verbosity)
Create a new error handler.
Level of verbosity.
verbose
CLASS Parser.LR.Parser |
This object implements an LALR(1) parser and compiler.
Normal use of this object would be:
set_error_handler
{add_rule, set_priority, set_associativity}*
set_symbol_to_string
compile
{parse}*
mapping(int:array(Rule)) Parser.LR.Parser.grammar
The grammar itself.
Kernel Parser.LR.Parser.start_state
The initial LR0 state.
int Parser.LR.Parser.lr_error
Error code
mapping(string:Kernel) Parser.LR.Parser.known_states
LR0 states that are already known to the compiler.
function(SeverityLevel:void) Parser.LR.Parser.error_handler
Compile error and warning handler.
string rule_to_string(Rule r)
Pretty-prints a rule to a string.
Rule to print.
string item_to_string(Item i)
Pretty-prints an item to a string.
Item to pretty-print.
string state_to_string(Kernel state)
Pretty-prints a state to a string.
State to pretty-print.
string _sprintf()
Pretty-prints the current grammar to a string.
mixed cast(string type)
Implements casting.
Type to cast to.
void set_priority(string terminal, int pri_val)
Sets the priority of a terminal.
Terminal to set the priority for.
Priority; higher = prefer this terminal.
void set_associativity(string terminal, int assoc)
Sets the associativity of a terminal.
Terminal to set the associativity for.
Associativity; negative - left, positive - right, zero - no associativity.
void set_symbol_to_string(void|function(int|string:string) s_to_s)
Sets the symbol to string conversion function. The conversion function is used by the various *_to_string functions to make comprehensible output.
Symbol to string conversion function. If zero or not specified, use the built-in function.
void set_error_handler(void|function(SeverityLevel:void) handler)
Sets the error report function.
Function to call to report errors and warnings. If zero or not specifier, use the built-in function.
void add_rule(Rule r)
Add a rule to the grammar.
Rule to add.
StateQueue Parser.LR.Parser.s_q
Contains all states used. In the queue section are the states that remain to be compiled.
int compile()
Compiles the grammar into a parser, so that parse() can be called.
mixed parse(object|function(void:string|array(string|mixed)) scanner, void|object action_object)
Parse the input according to the compiled grammar. The last value reduced is returned.
The parser must have been compiled (with compile()) prior to calling this function.
Errors should be throw()n.
The scanner function. It returns the next symbol from the input. It should either return a string (terminal) or an array with a string (terminal) and a mixed (value). EOF is indicated with the empty string.
Object used to resolve those actions that have been specified as strings.
CLASS Parser.LR.Parser.Item |
An LR(0) item, a partially parsed rule.
Rule Parser.LR.Parser.Item.r
The rule
int Parser.LR.Parser.Item.offset
How long into the rule the parsing has come.
Kernel Parser.LR.Parser.Item.next_state
The state we will get if we shift according to this rule
Item Parser.LR.Parser.Item.master_item
Item representing this one (used for shifts).
multiset(string) Parser.LR.Parser.Item.direct_lookahead
Look-ahead set for this item.
multiset(string) Parser.LR.Parser.Item.error_lookahead
Look-ahead set used for detecting conflicts
multiset(Item) Parser.LR.Parser.Item.relation
Relation to other items (used when compiling).
int Parser.LR.Parser.Item.counter
Depth counter (used when compiling).
int Parser.LR.Parser.Item.number
Item identification number (used when compiling).
int Parser.LR.Parser.Item.item_id
Used to identify the item. Equal to r->number + offset.
CLASS Parser.LR.Parser.Kernel |
Implements an LR(1) state
multiset(Rule) Parser.LR.Parser.Kernel.rules
Used to check if a rule already has been added when doing closures.
array(Item) Parser.LR.Parser.Kernel.items
Contains the items in this state.
mapping(int:Item) Parser.LR.Parser.Kernel.item_id_to_item
Used to lookup items given rule and offset
mapping(int:multiset(Item)) Parser.LR.Parser.Kernel.symbol_items
Contains the items whose next symbol is this non-terminal.
mapping(int|string:Kernel|Rule) Parser.LR.Parser.Kernel.action
The action table for this state
object(kernel) SHIFT to this state on this symbol.
object(rule) REDUCE according to this rule on this symbol.
multiset Parser.LR.Parser.Kernel.closure_set
The symbols that closure has been called on.
void add_item(Item i)
Add an item to the state.
void closure(int nonterminal)
Make the closure of this state.
Nonterminal to make the closure on.
multiset(int|string) goto_set()
Make the goto-set of this state.
Kernel do_goto(int|string symbol)
Generates the state reached when doing goto on the specified symbol. i.e. it compiles the LR(0) state.
Symbol to make goto on.
CLASS Parser.LR.Parser.StateQueue |
This is a queue, which keeps the elements even after they are retrieved.
int Parser.LR.Parser.StateQueue.head
Index of the head of the queue.
int Parser.LR.Parser.StateQueue.tail
Index of the tail of the queue.
array(Kernel) Parser.LR.Parser.StateQueue.arr
The queue itself.
Kernel push(Kernel state)
Pushes the state on the queue.
State to push.
Kernel next()
Return the next state from the queue.
Module Parser.LR.GrammarParser |
This module generates an LR parser from a grammar specified according to the following grammar:
directives : directive ;
directives : directives directive ;
directive : declaration ;
directive : rule ;
declaration : "%token" terminals ";" ;
rule : nonterminal ":" symbols ";" ;
rule : nonterminal ":" symbols action ";" ;
symbols : symbol ;
symbols : symbols symbol ;
terminals : terminal ;
terminals : terminals terminal ;
symbol : nonterminal ;
symbol : "string" ;
action : "{" "identifier" "}" ;
nonterminal : "identifier" ;
terminal : "string";
int Parser.LR.GrammarParser.lr_error
Error code from the parsing.
Parser make_parser(string str, object|void m)
Compiles the parser-specification given in the first argument. Named actions are taken from the object if available, otherwise left as is.
Returns error-code in both GrammarParser.error and return_value->lr_error.
int|Parser make_parser_from_file(string fname, object|void m)
Compiles the file specified in the first argument into an LR parser.
make_parser
Module Parser.Python |
array(string) split(string data)
Returns the provided string with Python code as an array with tokens.
Module Parser.Pike |
This module parses and tokenizes Pike source code.
inherit "C.pmod"
array(string) split(string data, void|mapping state)
Splits the data string into an array of tokens. An additional element with a newline will be added to the resulting array of tokens. If the optional argument state is provided the split function is able to pause and resume splitting inside #"" and /**/ tokens. The state argument should be an initially empty mapping, in which split will store its state between successive calls.
CLASS Parser.Pike.UnterminatedStringError |
Error thrown when an unterminated string token is encountered.
inherit Error.Generic : Generic
string Parser.Pike.UnterminatedStringError.err_str
The string that failed to be tokenized
Module Parser.C |
array(string) split(string data, void|mapping state)
Splits the data string into an array of tokens. An additional element with a newline will be added to the resulting array of tokens. If the optional argument state is provided the split function is able to pause and resume splitting inside /**/ tokens. The state argument should be an initially empty mapping, in which split will store its state between successive calls.
array(Token) tokenize(array(string) s, void|string file)
Returns an array of Token objects given an array of string tokens.
array(Token|array) group(array(string|Token) tokens, void|mapping(string:string) groupings)
Fold sub blocks of an array of tokens into sub arrays, for grouping purposes.
The token array to fold.
Supplies the tokens marking the boundaries of blocks to fold. The indices of the mapping mark the start of a block, the corresponding values mark where the block ends. The sub arrays will start and end in these tokens. If no groupings mapping is provided, {}, () and [] are used as block boundaries.
array(Token|array) strip_line_statements(array(Token|array) tokens)
Strips off all (preprocessor) line statements from a token array.
array hide_whitespaces(array tokens)
Folds all whitespace tokens into the previous token's trailing_whitespaces.
string simple_reconstitute(array(string|Token|array) tokens)
Reconstitutes the token array into a plain string again; essentially reversing split() and whichever of the tokenize , group and hide_whitespaces methods may have been invoked.
string reconstitute_with_line_numbers(array(string|Token|array) tokens)
Like simple_reconstitute , but adding additional #line n "file" preprocessor statements in the output whereever a new line or file starts.
CLASS Parser.C.Token |
Represents a C token, along with a selection of associated data and operations.
int Parser.C.Token.line
The line where the token was found.
string Parser.C.Token.text
The actual token.
string Parser.C.Token.file
The file in which the token was found.
string Parser.C.Token.trailing_whitespaces
Trailing whitespaces.
void Parser.C.Token(string text, void|int line, void|string file, void|string trailing_whitespace)
string _sprintf(int how)
If the object is printed as %s it will only output its text contents.
int `==(mixed foo)
Tokens are considered equal if the text contents are equal. It is also possible to compare the Token object with a text string directly.
string `+(string ... s)
A string can be added to the Token, which will be added to the text contents.
string ``+(string ... s)
A string can be added to the Token, which will be added to the text contents.
mixed cast(string to)
It is possible to case a Token object to a string. The text content will be returned.
int|string `[](int a, void|int b)
Characters and ranges may be indexed from the text contents of the token.