Readme updates
This commit is contained in:
112
README.md
112
README.md
@@ -37,16 +37,12 @@ _mpc_ provides a number of features that this project does not offer, and also o
|
||||
* _mpc_ Doesn't pollute the namespace
|
||||
|
||||
|
||||
Demonstration
|
||||
=============
|
||||
Quickstart
|
||||
==========
|
||||
|
||||
In the below example I create a parser for basic mathematical expressions.
|
||||
Here is how one would use _mpc_ to create a parser for a basic mathematical expression language.
|
||||
|
||||
```c
|
||||
#include "mpc.h"
|
||||
|
||||
void parse_maths(const char *input) {
|
||||
|
||||
mpc_parser_t *Expr = mpc_new("expression");
|
||||
mpc_parser_t *Prod = mpc_new("product");
|
||||
mpc_parser_t *Value = mpc_new("value");
|
||||
@@ -63,7 +59,7 @@ void parse_maths(const char *input) {
|
||||
|
||||
mpc_result_t r;
|
||||
|
||||
if (!mpc_parse("<parse_maths>", input, Maths, &r)) {
|
||||
if (mpc_parse("input", input, Maths, &r)) {
|
||||
mpc_ast_print(r.output);
|
||||
mpc_ast_delete(r.output);
|
||||
} else {
|
||||
@@ -72,10 +68,9 @@ void parse_maths(const char *input) {
|
||||
}
|
||||
|
||||
mpc_cleanup(4, Expr, Prod, Value, Maths);
|
||||
}
|
||||
```
|
||||
|
||||
If you were to input `"(4 * 2 * 11 + 2) - 5"` into this function, the output would look something like this:
|
||||
If you were to set `input` to the string `(4 * 2 * 11 + 2) - 5`, the printed output would look like this.
|
||||
|
||||
```
|
||||
>:
|
||||
@@ -404,7 +399,7 @@ typedef mpc_val_t*(*mpc_fold_t)(int,mpc_val_t**);
|
||||
This takes a list of pointers to data values and must return some combined or folded version of these data values. It must ensure to free and input data that is no longer used once after combination has taken place.
|
||||
|
||||
|
||||
Case Study - C Identifier
|
||||
Case Study - Identifier
|
||||
=========================
|
||||
|
||||
Combinator Method
|
||||
@@ -526,35 +521,35 @@ Library Reference
|
||||
Common Parsers
|
||||
--------------
|
||||
|
||||
* `mpc_soi(void);` Matches only the start of input, returns `NULL`
|
||||
* `mpc_eoi(void);` Matches only the end of input, returns `NULL`
|
||||
* `mpc_whitespace(void);` Matches any whitespace character `" \f\n\r\t\v"`
|
||||
* `mpc_whitespaces(void);` Matches zero or more whitespace characters
|
||||
* `mpc_blank(void);` Matches whitespaces and frees the result, returns `NULL`
|
||||
* `mpc_newline(void);` Matches `'\n'`
|
||||
* `mpc_tab(void);` Matches `'\t'`
|
||||
* `mpc_escape(void);` Matches a backslash followed by any character
|
||||
* `mpc_digit(void);` Matches any character in the range `'0'` - `'9'`
|
||||
* `mpc_hexdigit(void);` Matches any character in the range `'0'` - `'9'` as well as `'A'` - `'F'` and `'a'` - `'f'`
|
||||
* `mpc_octdigit(void);` Matches any character in the range `'0'` - `'7'`
|
||||
* `mpc_digits(void);` Matches one or more digit
|
||||
* `mpc_hexdigits(void);` Matches one or more hexdigit
|
||||
* `mpc_octdigits(void);` Matches one or more octdigit
|
||||
* `mpc_lower(void);` Matches and lower case character
|
||||
* `mpc_upper(void);` Matches any upper case character
|
||||
* `mpc_alpha(void);` Matches and alphabet character
|
||||
* `mpc_underscore(void);` Matches `'_'`
|
||||
* `mpc_alphanum(void);` Matches any alphabet character, underscore or digit
|
||||
* `mpc_int(void);` Matches digits and returns an `int*`
|
||||
* `mpc_hex(void);` Matches hexdigits and returns an `int*`
|
||||
* `mpc_oct(void);` Matches octdigits and returns an `int*`
|
||||
* `mpc_number(void);` Matches `mpc_int`, `mpc_hex` or `mpc_oct`
|
||||
* `mpc_real(void);` Matches some floating point number as a string
|
||||
* `mpc_float(void);` Matches some floating point number and returns a `float*`
|
||||
* `mpc_char_lit(void);` Matches some character literal surrounded by `'`
|
||||
* `mpc_string_lit(void);` Matches some string literal surrounded by `"`
|
||||
* `mpc_regex_lit(void);` Matches some regex literal surrounded by `/`
|
||||
* `mpc_ident(void);` Matches a C style identifier
|
||||
* `mpc_soi;` Matches only the start of input, returns `NULL`
|
||||
* `mpc_eoi;` Matches only the end of input, returns `NULL`
|
||||
* `mpc_whitespace` Matches any whitespace character `" \f\n\r\t\v"`
|
||||
* `mpc_whitespaces` Matches zero or more whitespace characters
|
||||
* `mpc_blank` Matches whitespaces and frees the result, returns `NULL`
|
||||
* `mpc_newline` Matches `'\n'`
|
||||
* `mpc_tab` Matches `'\t'`
|
||||
* `mpc_escape` Matches a backslash followed by any character
|
||||
* `mpc_digit` Matches any character in the range `'0'` - `'9'`
|
||||
* `mpc_hexdigit` Matches any character in the range `'0'` - `'9'` as well as `'A'` - `'F'` and `'a'` - `'f'`
|
||||
* `mpc_octdigit` Matches any character in the range `'0'` - `'7'`
|
||||
* `mpc_digits` Matches one or more digit
|
||||
* `mpc_hexdigits` Matches one or more hexdigit
|
||||
* `mpc_octdigits` Matches one or more octdigit
|
||||
* `mpc_lower` Matches and lower case character
|
||||
* `mpc_upper` Matches any upper case character
|
||||
* `mpc_alpha` Matches and alphabet character
|
||||
* `mpc_underscore` Matches `'_'`
|
||||
* `mpc_alphanum` Matches any alphabet character, underscore or digit
|
||||
* `mpc_int` Matches digits and returns an `int*`
|
||||
* `mpc_hex` Matches hexdigits and returns an `int*`
|
||||
* `mpc_oct` Matches octdigits and returns an `int*`
|
||||
* `mpc_number` Matches `mpc_int`, `mpc_hex` or `mpc_oct`
|
||||
* `mpc_real` Matches some floating point number as a string
|
||||
* `mpc_float` Matches some floating point number and returns a `float*`
|
||||
* `mpc_char_lit` Matches some character literal surrounded by `'`
|
||||
* `mpc_string_lit` Matches some string literal surrounded by `"`
|
||||
* `mpc_regex_lit` Matches some regex literal surrounded by `/`
|
||||
* `mpc_ident` Matches a C style identifier
|
||||
|
||||
|
||||
Useful Parsers
|
||||
@@ -647,38 +642,31 @@ mpc_val_t* fold_maths(int n, mpc_val_t **xs) {
|
||||
And then we use this to specify a basic grammar, which folds together any results.
|
||||
|
||||
```c
|
||||
int parse_maths(char* input) {
|
||||
|
||||
mpc_parser_t *Expr = mpc_new("expr");
|
||||
mpc_parser_t *Factor = mpc_new("factor");
|
||||
mpc_parser_t *Term = mpc_new("term");
|
||||
mpc_parser_t *Maths = mpc_new("maths");
|
||||
|
||||
mpc_define(Expr, mpc_or(2,
|
||||
mpc_and(3, fold_maths, Factor, mpc_oneof("*/"), Factor, free, free),
|
||||
mpc_and(3, fold_maths,
|
||||
Factor, mpc_oneof("*/"), Factor,
|
||||
free, free),
|
||||
Factor
|
||||
));
|
||||
|
||||
mpc_define(Factor, mpc_or(2,
|
||||
mpc_and(3, fold_maths, Term, mpc_oneof("+-"), Term, free, free),
|
||||
mpc_and(3, fold_maths,
|
||||
Term, mpc_oneof("+-"), Term,
|
||||
free, free),
|
||||
Term
|
||||
));
|
||||
|
||||
mpc_define(Term, mpc_or(2, mpc_int(), mpc_parens(Expr, free)));
|
||||
mpc_define(Maths, mpc_enclose(Expr, free));
|
||||
mpc_define(Maths, mpc_whole(Expr, free));
|
||||
|
||||
mpc_result_t r;
|
||||
if (!mpc_parse("parse_maths", input, Maths, &r)) {
|
||||
mpc_err_print(r.error);
|
||||
abort();
|
||||
}
|
||||
/* Do Some Parsing... */
|
||||
|
||||
int result = *r.output;
|
||||
printf("Result: %i\n", result);
|
||||
free(r.output);
|
||||
|
||||
return result;
|
||||
}
|
||||
mpc_delete(Maths);
|
||||
```
|
||||
|
||||
If we supply this function with something like `(4*2)+5`, we can expect it to output `13`.
|
||||
@@ -709,7 +697,7 @@ String literals are surrounded in double quotes `"`. Character literals in singl
|
||||
|
||||
Parts specified one after another are parsed in order (like `mpc_and`), while parts separated by a pipe `|` are alternatives (like `mpc_or`). Parenthesis `()` are used to specify precedence. `*` can be used to mean zero or more of. `+` for one or more of. `?` for zero or one of. `!` for negation. And a number inside braces `{5}` to means so many counts of.
|
||||
|
||||
Rules are specified by rule name, optionally followed by an _expected_ string, followed by a colon `:`, followed by the definition, and ending in a semicolon `;`.
|
||||
Rules are specified by rule name, optionally followed by an _expected_ string, followed by a colon `:`, followed by the definition, and ending in a semicolon `;`. The flags variable is a set of flags `MPC_LANG_DEFAULT`, `MPC_LANG_PREDICTIVE`, or `MPC_LANG_WHITESPACE_SENSITIVE`. For specifying if the language is predictive or whitespace sensitive.
|
||||
|
||||
Like with the regular expressions, this user input is parsed by existing parts of the _mpc_ library. It provides one of the more powerful features of the library.
|
||||
|
||||
@@ -719,7 +707,7 @@ Like with the regular expressions, this user input is parsed by existing parts o
|
||||
mpc_parser_t *mpca_grammar(int flags, const char *grammar, ...);
|
||||
```
|
||||
|
||||
This takes in some single right hand side of a rule, as well as a list of any of the parsers it refers to, and outputs a parser that does exactly what is specified by the rule. The flags variable is a set of flags `MPC_LANG_DEFAULT`, `MPC_LANG_PREDICTIVE`, or `MPC_LANG_WHITESPACE_SENSITIVE`. For specifying if the language is predictive or whitespace sensitive.
|
||||
This takes in some single right hand side of a rule, as well as a list of any of the parsers it refers to, and outputs a parser that does exactly what is specified by the rule.
|
||||
|
||||
* * *
|
||||
|
||||
@@ -758,14 +746,14 @@ _mpc_ provides some automatic generation of error messages. These can be enhance
|
||||
Limitations & FAQ
|
||||
=================
|
||||
|
||||
### ASCII
|
||||
### Does this support Unicode?
|
||||
|
||||
Only supports ASCII. Sorry!
|
||||
_mpc_ Only supports ASCII. Sorry! I welcome contributions as making the library support Unicode is non-trivial.
|
||||
|
||||
|
||||
### Backtracking and Left Recursion
|
||||
|
||||
MPC supports backtracking, but will not completely backtrack up a parse tree if it encounters some success on the path it is going. To demonstrate this behaviour examine the following erroneous grammar, intended to parse either a C style identifier, or a C style function call.
|
||||
_mpc_ supports backtracking, but will not completely backtrack up a parse tree if it encounters some success on the path it is going. To demonstrate this behaviour examine the following erroneous grammar, intended to parse either a C style identifier, or a C style function call.
|
||||
|
||||
```
|
||||
factor : <ident>
|
||||
@@ -788,7 +776,7 @@ factor : <ident> ('(' <expr>? (',' <expr>)* ')')? ;
|
||||
```
|
||||
|
||||
|
||||
### Max String Length
|
||||
### How can I avoid the maximum string literal length?
|
||||
|
||||
Some compilers limit the maximum length of string literals. If you have a huge language string in the source file to be passed into `mpca_lang` you might encounter this. The ANSI standard says that 509 is the maximum length allowed for a string literal. Most compilers support greater than this. Visual Studio supports up to 2048 characters, while gcc allocates memory dynamically and so has no real limit.
|
||||
|
||||
|
Reference in New Issue
Block a user