In recent R versions the parser can attach source code location
information to the parsed expressions. This information is often
useful for static analysis, e.g. code linting. It can be accessed
via the getParseData
function.
Arguments
- x
an expression returned from
parse
, or a function or other object with source reference information- includeText
logical; whether to include the text of parsed items in the result
- pretty
Whether to pretty-indent the XML output. It has a small overhead which probably only matters for very large source files.
Details
xml_parse_data
converts this information to an XML tree.
The R parser's token names are preserved in the XML as much as
possible, but some of them are not valid XML tag names, so they are
renamed, see the xml_parse_token_map
vector for the
mapping.
The top XML tag is <exprlist>
, which is a list of
expressions, each expression is an <expr>
tag. Each tag
has attributes that define the location: line1
, col1
,
line2
, col2
. These are from the getParseData
data frame column names.
See an example below. See also the README at
https://github.com/r-lib/xmlparsedata#readme
for examples on how to search the XML tree with the xml2
package
and XPath expressions.
Note that `xml_parse_data()` silently drops all control characters (0x01-0x1f) from the input, except horizontal tab (0x09) and newline (0x0a), because they are invalid in XML 1.0.
See also
xml_parse_token_map
for the token names.
https://github.com/r-lib/xmlparsedata#readme for more
information and use cases.
Examples
code <- "function(a = 1, b = 2) {\n a + b\n}\n"
expr <- parse(text = code, keep.source = TRUE)
# The base R way:
getParseData(expr)
#> line1 col1 line2 col2 id parent token terminal text
#> 33 1 1 3 1 33 0 expr FALSE
#> 1 1 1 1 8 1 33 FUNCTION TRUE function
#> 2 1 9 1 9 2 33 '(' TRUE (
#> 3 1 10 1 10 3 33 SYMBOL_FORMALS TRUE a
#> 4 1 12 1 12 4 33 EQ_FORMALS TRUE =
#> 5 1 14 1 14 5 6 NUM_CONST TRUE 1
#> 6 1 14 1 14 6 33 expr FALSE
#> 7 1 15 1 15 7 33 ',' TRUE ,
#> 10 1 17 1 17 10 33 SYMBOL_FORMALS TRUE b
#> 11 1 19 1 19 11 33 EQ_FORMALS TRUE =
#> 12 1 21 1 21 12 13 NUM_CONST TRUE 2
#> 13 1 21 1 21 13 33 expr FALSE
#> 14 1 22 1 22 14 33 ')' TRUE )
#> 30 1 24 3 1 30 33 expr FALSE
#> 17 1 24 1 24 17 30 '{' TRUE {
#> 25 2 3 2 7 25 30 expr FALSE
#> 19 2 3 2 3 19 21 SYMBOL TRUE a
#> 21 2 3 2 3 21 25 expr FALSE
#> 20 2 5 2 5 20 25 '+' TRUE +
#> 22 2 7 2 7 22 24 SYMBOL TRUE b
#> 24 2 7 2 7 24 25 expr FALSE
#> 28 3 1 3 1 28 30 '}' TRUE }
cat(xml_parse_data(expr, pretty = TRUE))
#> <?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
#> <exprlist>
#> <expr line1="1" col1="1" line2="3" col2="1" start="26" end="76">
#> <FUNCTION line1="1" col1="1" line2="1" col2="8" start="26" end="33">function</FUNCTION>
#> <OP-LEFT-PAREN line1="1" col1="9" line2="1" col2="9" start="34" end="34">(</OP-LEFT-PAREN>
#> <SYMBOL_FORMALS line1="1" col1="10" line2="1" col2="10" start="35" end="35">a</SYMBOL_FORMALS>
#> <EQ_FORMALS line1="1" col1="12" line2="1" col2="12" start="37" end="37">=</EQ_FORMALS>
#> <expr line1="1" col1="14" line2="1" col2="14" start="39" end="39">
#> <NUM_CONST line1="1" col1="14" line2="1" col2="14" start="39" end="39">1</NUM_CONST>
#> </expr>
#> <OP-COMMA line1="1" col1="15" line2="1" col2="15" start="40" end="40">,</OP-COMMA>
#> <SYMBOL_FORMALS line1="1" col1="17" line2="1" col2="17" start="42" end="42">b</SYMBOL_FORMALS>
#> <EQ_FORMALS line1="1" col1="19" line2="1" col2="19" start="44" end="44">=</EQ_FORMALS>
#> <expr line1="1" col1="21" line2="1" col2="21" start="46" end="46">
#> <NUM_CONST line1="1" col1="21" line2="1" col2="21" start="46" end="46">2</NUM_CONST>
#> </expr>
#> <OP-RIGHT-PAREN line1="1" col1="22" line2="1" col2="22" start="47" end="47">)</OP-RIGHT-PAREN>
#> <expr line1="1" col1="24" line2="3" col2="1" start="49" end="76">
#> <OP-LEFT-BRACE line1="1" col1="24" line2="1" col2="24" start="49" end="49">{</OP-LEFT-BRACE>
#> <expr line1="2" col1="3" line2="2" col2="7" start="53" end="57">
#> <expr line1="2" col1="3" line2="2" col2="3" start="53" end="53">
#> <SYMBOL line1="2" col1="3" line2="2" col2="3" start="53" end="53">a</SYMBOL>
#> </expr>
#> <OP-PLUS line1="2" col1="5" line2="2" col2="5" start="55" end="55">+</OP-PLUS>
#> <expr line1="2" col1="7" line2="2" col2="7" start="57" end="57">
#> <SYMBOL line1="2" col1="7" line2="2" col2="7" start="57" end="57">b</SYMBOL>
#> </expr>
#> </expr>
#> <OP-RIGHT-BRACE line1="3" col1="1" line2="3" col2="1" start="76" end="76">}</OP-RIGHT-BRACE>
#> </expr>
#> </expr>
#> </exprlist>