andy/mpc

Fork 0

Go to file

ranchuan d77c39d187

CI / Build and test (push) Has been cancelled

Details

翻译 readme.md

2024-10-28 16:01:33 +08:00

.github/workflows

add ci

2023-01-05 16:16:19 -08:00

examples

(closes #122 ) add ^M to end of line_reader example

2020-02-03 23:39:35 -06:00

tests

fix: mpc_sepby1 not applying fold correctly

2023-08-20 21:12:50 -05:00

.gitattributes

Introduce end-of-line normalization

2014-04-15 12:49:57 +01:00

.gitignore

remove tags

2023-08-03 20:48:35 -05:00

LICENSE.md

big update to readme

2013-09-26 13:15:00 +01:00

Makefile

add .dirstamp to examples

2023-08-03 20:48:45 -05:00

mpc.c

fix: mpc_sepby1 not applying fold correctly

2023-08-20 21:12:50 -05:00

mpc.h

feat: add sepby1 combinator

2023-08-08 14:25:18 -05:00

mpc.pc

(#124 ) preliminary pkg-config file setup and install

2020-02-09 22:04:09 -06:00

package.json

Added mode option to regex and also changed example from a line reader to a tokenizer.

2018-10-14 17:20:11 -04:00

README.md

翻译 readme.md

2024-10-28 16:01:33 +08:00

README.md

Micro Parser Combinators

Version 0.9.0

About

mpc is a lightweight and powerful Parser Combinator library for C.

Using mpc might be of interest to you if you are...

Building a new programming language
Building a new data format
Parsing an existing programming language
Parsing an existing data format
Embedding a Domain Specific Language
Implementing Greenspun's Tenth Rule

Features

Type-Generic
Predictive, Recursive Descent
Easy to Integrate (One Source File in ANSI C)
Automatic Error Message Generation
Regular Expression Parser Generator
Language/Grammar Parser Generator

Alternatives

The current main alternative for a C based parser combinator library is a branch of Cesium3.

mpc provides a number of features that this project does not offer, and also overcomes a number of potential downsides:

mpc Works for Generic Types
mpc Doesn't rely on Boehm-Demers-Weiser Garbage Collection
mpc Doesn't use setjmp and longjmp for errors
mpc Doesn't pollute the namespace

Quickstart

Here is how one would use mpc to create a parser for a basic mathematical expression language.

mpc_parser_t *Expr  = mpc_new("expression");
mpc_parser_t *Prod  = mpc_new("product");
mpc_parser_t *Value = mpc_new("value");
mpc_parser_t *Maths = mpc_new("maths");

mpca_lang(MPCA_LANG_DEFAULT,
  " expression : <product> (('+' | '-') <product>)*; "
  " product    : <value>   (('*' | '/')   <value>)*; "
  " value      : /[0-9]+/ | '(' <expression> ')';    "
  " maths      : /^/ <expression> /$/;               ",
  Expr, Prod, Value, Maths, NULL);

mpc_result_t r;

if (mpc_parse("input", input, Maths, &r)) {
  mpc_ast_print(r.output);
  mpc_ast_delete(r.output);
} else {
  mpc_err_print(r.error);
  mpc_err_delete(r.error);
}

mpc_cleanup(4, Expr, Prod, Value, Maths);

If you were to set input to the string (4 * 2 * 11 + 2) - 5, the printed output would look like this.

>
  regex
  expression|>
    value|>
      char:1:1 '('
      expression|>
        product|>
          value|regex:1:2 '4'
          char:1:4 '*'
          value|regex:1:6 '2'
          char:1:8 '*'
          value|regex:1:10 '11'
        char:1:13 '+'
        product|value|regex:1:15 '2'
      char:1:16 ')'
    char:1:18 '-'
    product|value|regex:1:20 '5'
  regex

Getting Started

Introduction

Parser Combinators are structures that encode how to parse particular languages. They can be combined using intuitive operators to create new parsers of increasing complexity. Using these operators detailed grammars and languages can be parsed and processed in a quick, efficient, and easy way.

解析器组合器是对如何解析特定语言进行编码的结构。它们可以使用直观的运算符组合在一起，以创建越来越复杂的新解析器。使用这些运算符，可以快速、高效、简单地解析和处理详细的语法和语言。

The trick behind Parser Combinators is the observation that by structuring the library in a particular way, one can make building parser combinators look like writing a grammar itself. Therefore instead of describing how to parse a language, a user must only specify the language itself, and the library will work out how to parse it ... as if by magic!

Parser Combinators背后的诀窍是观察到，通过以特定的方式构建库，可以使构建解析器组合子看起来像编写语法本身。因此，用户只需指定语言本身，而不是描述如何解析语言，库将计算出如何解析它。。。仿佛施了魔法！

mpc can be used in this mode, or, as shown in the above example, you can specify the grammar directly as a string or in a file.

_mpc可以在这种模式下使用，或者，如上例所示，您可以直接将语法指定为字符串或文件。

Basic Parsers

String Parsers

All the following functions construct new basic parsers of the type mpc_parser_t *. All of those parsers return a newly allocated char * with the character(s) they manage to match. If unsuccessful they will return an error. They have the following functionality.

以下所有函数都构造了类型为“mpc_parser_t*”的新基本解析器。所有这些解析器都返回一个新分配的“char*”，其中包含它们设法匹配的字符。如果失败，他们将返回错误。它们具有以下功能。

mpc_parser_t *mpc_any(void);

Matches any individual character

匹配任何单个字符

mpc_parser_t *mpc_char(char c);

Matches a single given character c

匹配单个给定字符c

mpc_parser_t *mpc_range(char s, char e);

Matches any single given character in the range s to e (inclusive)

匹配范围s到e（含）内的任何单个给定字符

mpc_parser_t *mpc_oneof(const char *s);

Matches any single given character in the string s

匹配字符串中的任何单个给定字符`

mpc_parser_t *mpc_noneof(const char *s);

Matches any single given character not in the string s

匹配不在字符串中的任何单个给定字符`

mpc_parser_t *mpc_satisfy(int(*f)(char));

Matches any single given character satisfying function f

匹配满足函数f的任何单个给定字符

mpc_parser_t *mpc_string(const char *s);

Matches exactly the string s

完全匹配字符串s

Other Parsers

Several other functions exist that construct parsers with some other special functionality.

还有其他几个函数可以构造具有其他特殊功能的解析器。

mpc_parser_t *mpc_pass(void);

Consumes no input, always successful, returns NULL

不消耗输入，总是成功，返回NULL

mpc_parser_t *mpc_fail(const char *m);
mpc_parser_t *mpc_failf(const char *fmt, ...);

Consumes no input, always fails with message m or formatted string fmt.

不消耗任何输入，总是以消息m或格式化字符串fmt失败。

mpc_parser_t *mpc_lift(mpc_ctor_t f);

Consumes no input, always successful, returns the result of function f

不消耗输入，总是成功，返回函数f的结果

mpc_parser_t *mpc_lift_val(mpc_val_t *x);

Consumes no input, always successful, returns x

不消耗输入，总是成功，返回x

mpc_parser_t *mpc_state(void);

Consumes no input, always successful, returns a copy of the parser state as a mpc_state_t *. This state is newly allocated and so needs to be released with free when finished with.

不消耗任何输入，总是成功的，返回解析器状态的副本作为mpc_state_t*。此状态是新分配的，因此在完成时需要用“free”释放。

mpc_parser_t *mpc_anchor(int(*f)(char,char));

Consumes no input. Successful when function f returns true. Always returns NULL.

不消耗任何输入。当函数f返回true时成功。始终返回NULL。

Function f is a anchor function. It takes as input the last character parsed, and the next character in the input, and returns success or failure. This function can be set by the user to ensure some condition is met. For example to test that the input is at a boundary between words and non-words.

函数f是一个_anchor_函数。它将解析的最后一个字符和输入中的下一个字符作为输入，并返回成功或失败。此功能可由用户设置，以确保满足某些条件。例如，测试输入是否位于单词和非单词之间的边界。

At the start of the input the first argument is set to '\0'. At the end of the input the second argument is set to '\0'.

在输入开始时，第一个参数设置为'\0'。在输入的末尾，第二个参数被设置为'\0'。

Parsing

Once you've build a parser, you can run it on some input using one of the following functions. These functions return 1 on success and 0 on failure. They output either the result, or an error to a mpc_result_t variable. This type is defined as follows.

构建解析器后，您可以使用以下函数之一在某些输入上运行它。这些函数成功时返回1，失败时返回0。它们将结果或错误输出到mpc_result_t变量。这种类型定义如下。

typedef union {
  mpc_err_t *error;
  mpc_val_t *output;
} mpc_result_t;

where mpc_val_t * is synonymous with void * and simply represents some pointer to data - the exact type of which is dependant on the parser.

其中mpc_val_t*与void*同义，只是表示指向数据的某个指针，其确切类型取决于解析器。

int mpc_parse(const char *filename, const char *string, mpc_parser_t *p, mpc_result_t *r);

Run a parser on some string.

对某个字符串运行解析器。

int mpc_parse_file(const char *filename, FILE *file, mpc_parser_t *p, mpc_result_t *r);

Run a parser on some file.

对某个文件运行解析器。

int mpc_parse_pipe(const char *filename, FILE *pipe, mpc_parser_t *p, mpc_result_t *r);

Run a parser on some pipe (such as stdin).

在某个管道（如stdin）上运行解析器。

int mpc_parse_contents(const char *filename, mpc_parser_t *p, mpc_result_t *r);

Run a parser on the contents of some file.

对某个文件的内容运行解析器。

Combinators

Combinators are functions that take one or more parsers and return a new parser of some given functionality.

组合器是接受一个或多个解析器并返回某个给定功能的新解析器的函数。

These combinators work independently of exactly what data type the parser(s) supplied as input return. In languages such as Haskell ensuring you don't input one type of data into a parser requiring a different type is done by the compiler. But in C we don't have that luxury. So it is at the discretion of the programmer to ensure that he or she deals correctly with the outputs of different parser types.

这些组合子的工作与解析器作为输入返回提供的数据类型完全无关。在Haskell等语言中，编译器会确保不将一种类型的数据输入到需要不同类型的解析器中。但在C中，我们没有这种奢侈。因此，程序员可以自行决定是否正确处理不同解析器类型的输出。

A second annoyance in C is that of manual memory management. Some parsers might get half-way and then fail. This means they need to clean up any partial result that has been collected in the parse. In Haskell this is handled by the Garbage Collector, but in C these combinators will need to take destructor functions as input, which say how clean up any partial data that has been collected.

C中的第二个烦恼是手动内存管理。一些解析器可能会中途失败。这意味着他们需要清理在解析中收集到的任何部分结果。在Haskell中，这是由垃圾收集器处理的，但在C中，这些组合子需要将_destructor_函数作为输入，该函数表示如何清理已收集的任何部分数据。

Here are the main combinators and how to use then.

以下是主要的组合子及其使用方法。

mpc_parser_t *mpc_expect(mpc_parser_t *a, const char *e);
mpc_parser_t *mpc_expectf(mpc_parser_t *a, const char *fmt, ...);

Returns a parser that runs a, and on success returns the result of a, while on failure reports that e was expected.

返回一个运行a的解析器，成功时返回a结果，失败时返回预期的e结果。

mpc_parser_t *mpc_apply(mpc_parser_t *a, mpc_apply_t f);
mpc_parser_t *mpc_apply_to(mpc_parser_t *a, mpc_apply_to_t f, void *x);

Returns a parser that applies function f (optionality taking extra input x) to the result of parser a.

返回一个解析器，该解析器将函数f（可选性接受额外的输入x）应用于解析器a的结果。

mpc_parser_t *mpc_check(mpc_parser_t *a, mpc_dtor_t da, mpc_check_t f, const char *e);
mpc_parser_t *mpc_check_with(mpc_parser_t *a, mpc_dtor_t da, mpc_check_with_t f, void *x, const char *e);
mpc_parser_t *mpc_checkf(mpc_parser_t *a, mpc_dtor_t da, mpc_check_t f, const char *fmt, ...);
mpc_parser_t *mpc_check_withf(mpc_parser_t *a, mpc_dtor_t da, mpc_check_with_t f, void *x, const char *fmt, ...);

Returns a parser that applies function f (optionally taking extra input x) to the result of parser a. If f returns non-zero, then the parser succeeds and returns the value of a (possibly modified by f). If f returns zero, then the parser fails with message e, and the result of a is destroyed with the destructor da.

返回一个解析器，该解析器将函数f（可选地接受额外的输入x）应用于解析器a的结果。如果f返回非零，则解析器成功并返回a的值（可能被f修改）。如果f返回零，那么解析器将失败，并返回消息e，a的结果将被析构函数da销毁。

mpc_parser_t *mpc_not(mpc_parser_t *a, mpc_dtor_t da);
mpc_parser_t *mpc_not_lift(mpc_parser_t *a, mpc_dtor_t da, mpc_ctor_t lf);

Returns a parser with the following behaviour. If parser a succeeds, then it fails and consumes no input. If parser a fails, then it succeeds, consumes no input and returns NULL (or the result of lift function lf). Destructor da is used to destroy the result of a on success.

返回具有以下行为的解析器。如果解析器a成功，则它失败并且不消耗任何输入。如果解析器a失败，则它成功，不消耗任何输入并返回NULL（或提升函数lf的结果）。析构函数da用于析构a对成功的结果。

mpc_parser_t *mpc_maybe(mpc_parser_t *a);
mpc_parser_t *mpc_maybe_lift(mpc_parser_t *a, mpc_ctor_t lf);

Returns a parser that runs a. If a is successful then it returns the result of a. If a is unsuccessful then it succeeds, but returns NULL (or the result of lf).

返回一个运行a的解析器。如果a成功，则返回a的结果。如果a不成功，则成功，但返回NULL（或lf的结果）。

mpc_parser_t *mpc_many(mpc_fold_t f, mpc_parser_t *a);

Runs a zero or more times until it fails. Results are combined using fold function f. See the Function Types section for more details.

运行a零次或多次，直到失败。使用折叠函数f组合结果。有关更多详细信息，请参阅_Function Types_部分。

mpc_parser_t *mpc_many1(mpc_fold_t f, mpc_parser_t *a);

Runs a one or more times until it fails. Results are combined with fold function f.

运行a一次或多次，直到失败。结果与折叠函数f相结合。

mpc_parser_t *mpc_sepby1(mpc_fold_t f, mpc_parser_t *sep, mpc_parser_t *a);

Runs a one or more times, separated by sep. Results are combined with fold function f.

运行a一次或多次，用sep分隔。结果与折叠函数f相结合。

mpc_parser_t *mpc_count(int n, mpc_fold_t f, mpc_parser_t *a, mpc_dtor_t da);

Runs a exactly n times. If this fails, any partial results are destructed with da. If successful results of a are combined using fold function f.

运行a正好n次。如果失败，任何部分结果都将用da销毁。如果成功使用折叠函数f组合a。

mpc_parser_t *mpc_or(int n, ...);

Attempts to run n parsers in sequence, returning the first one that succeeds. If all fail, returns an error.

尝试按顺序运行n个解析器，返回第一个成功的解析器。如果全部失败，则返回错误。

mpc_parser_t *mpc_and(int n, mpc_fold_t f, ...);

Attempts to run n parsers in sequence, returning the fold of the results using fold function f. First parsers must be specified, followed by destructors for each parser, excluding the final parser. These are used in case of partial success. For example: mpc_and(3, mpcf_strfold, mpc_char('a'), mpc_char('b'), mpc_char('c'), free, free); would attempt to match 'a' followed by 'b' followed by 'c', and if successful would concatenate them using mpcf_strfold. Otherwise would use free on the partial results.

尝试按顺序运行n个解析器，使用fold函数f返回结果的倍数。必须指定第一个解析器，然后为每个解析器指定析构函数，不包括最后一个解析器。这些用于部分成功的情况。例如：mpc_and（3，mpcf_strfold，mpc_char（'a'），mpc_char（'b'），mpc_char（'c'），free，free）；将尝试匹配'a'、'b'和'c'，如果成功，将使用mpcf_strfold将它们连接起来。否则，将在部分结果上使用free。

mpc_parser_t *mpc_predictive(mpc_parser_t *a);

Returns a parser that runs a with backtracking disabled. This means if a consumes more than one character, it will not be reverted, even on failure. Turning backtracking off has good performance benefits for grammars which are LL(1). These are grammars where the first character completely determines the parse result - such as the decision of parsing either a C identifier, number, or string literal. This option should not be used for non LL(1) grammars or it will produce incorrect results or crash the parser.

返回一个在禁用回溯的情况下运行a的解析器。这意味着，如果a消耗了多个字符，即使失败，它也不会被还原。对于LL（1）语法，关闭回溯具有良好的性能优势。这些语法中，第一个字符完全决定了解析结果，例如解析C标识符、数字或字符串文字的决定。此选项不应用于非LL（1）语法，否则会产生不正确的结果或使解析器崩溃。

Another way to think of mpc_predictive is that it can be applied to a parser (for a performance improvement) if either successfully parsing the first character will result in a completely successful parse, or all of the referenced sub-parsers are also LL(1).

另一种理解mpc_cpredictive的方法是，如果成功解析第一个字符将导致完全成功的解析，或者所有引用的子解析器都是LL（1），则可以将其应用于解析器（以提高性能）。

Function Types

The combinator functions take a number of special function types as function pointers. Here is a short explanation of those types are how they are expected to behave. It is important that these behave correctly otherwise it is easy to introduce memory leaks or crashes into the system.

组合函数接受许多特殊的函数类型作为函数指针。以下是对这些类型的简要解释，即它们应该如何表现。重要的是，这些行为必须正确，否则很容易在系统中引入内存泄漏或崩溃。

typedef void(*mpc_dtor_t)(mpc_val_t*);

Given some pointer to a data value it will ensure the memory it points to is freed correctly.

给定一个指向数据值的指针，它将确保它指向的内存被正确释放。

typedef mpc_val_t*(*mpc_ctor_t)(void);

Returns some data value when called. It can be used to create empty versions of data types when certain combinators have no known default value to return. For example it may be used to return a newly allocated empty string.

调用时返回一些数据值。当某些组合子没有已知的默认值可返回时，它可用于创建数据类型的_empty_版本。例如，它可用于返回新分配的空字符串。

typedef mpc_val_t*(*mpc_apply_t)(mpc_val_t*);
typedef mpc_val_t*(*mpc_apply_to_t)(mpc_val_t*,void*);

This takes in some pointer to data and outputs some new or modified pointer to data, ensuring to free the input data if it is no longer used. The apply_to variation takes in an extra pointer to some data such as global state.

这会接收一些指向数据的指针，并输出一些新的或修改过的指向数据的指示器，确保在不再使用输入数据时释放它。apply_to变量引入了一个指向全局状态等数据的额外指针。

typedef int(*mpc_check_t)(mpc_val_t**);
typedef int(*mpc_check_with_t)(mpc_val_t**,void*);

This takes in some pointer to data and outputs 0 if parsing should stop with an error. Additionally, this may change or free the input data. The check_with variation takes in an extra pointer to some data such as global state.

这会接收一些指向数据的指针，如果解析因错误而停止，则输出0。此外，这可能会更改或释放输入数据。check_with变量引入了一个指向某些数据（如全局状态）的额外指针。

typedef mpc_val_t*(*mpc_fold_t)(int,mpc_val_t**);

This takes a list of pointers to data values and must return some combined or folded version of these data values. It must ensure to free any input data that is no longer used once the combination has taken place.

这需要一个指向数据值的指针列表，并且必须返回这些数据值的组合或折叠版本。它必须确保在组合发生后释放不再使用的任何输入数据。

Case Study - Identifier

Combinator Method

Using the above combinators we can create a parser that matches a C identifier.

使用上述组合子，我们可以创建一个与C标识符匹配的解析器。

When using the combinators we need to supply a function that says how to combine two char *.

当使用组合子时，我们需要提供一个函数，说明如何组合两个char*。

For this we build a fold function that will concatenate zero or more strings together. For this sake of this tutorial we will write it by hand, but this (as well as many other useful fold functions), are actually included in mpc under the mpcf_* namespace, such as mpcf_strfold.

为此，我们构建了一个fold函数，将零个或多个字符串连接在一起。为了本教程的目的，我们将手工编写它，但这个（以及许多其他有用的折叠函数）实际上包含在“mpcf_*”命名空间下的_mpc_中，例如“mpcf_strfold”。

mpc_val_t *strfold(int n, mpc_val_t **xs) {
  char *x = calloc(1, 1);
  int i;
  for (i = 0; i < n; i++) {
    x = realloc(x, strlen(x) + strlen(xs[i]) + 1);
    strcat(x, xs[i]);
    free(xs[i]);
  }
  return x;
}

We can use this to specify a C identifier, making use of some combinators to say how the basic parsers are combined.

我们可以用它来指定一个C标识符，利用一些组合子来说明基本解析器是如何组合的。

mpc_parser_t *alpha = mpc_or(2, mpc_range('a', 'z'), mpc_range('A', 'Z'));
mpc_parser_t *digit = mpc_range('0', '9');
mpc_parser_t *underscore = mpc_char('_');

mpc_parser_t *ident = mpc_and(2, strfold,
  mpc_or(2, alpha, underscore),
  mpc_many(strfold, mpc_or(3, alpha, digit, underscore)),
  free);

/* Do Some Parsing... */

mpc_delete(ident);

Notice that previous parsers are used as input to new parsers we construct from the combinators. Note that only the final parser ident must be deleted. When we input a parser into a combinator we should consider it to be part of the output of that combinator.

请注意，之前的解析器被用作我们从组合子构造的新解析器的输入。请注意，只有最后一个解析器ident必须删除。当我们将解析器输入组合子时，我们应该将其视为该组合子输出的一部分。

Because of this we shouldn't create a parser and input it into multiple places, or it will be doubly freed.

因此，我们不应该创建解析器并将其输入到多个位置，否则它将被双重释放。

Regex Method

There is an easier way to do this than the above method. mpc comes with a handy regex function for constructing parsers using regex syntax. We can specify an identifier using a regex pattern as shown below.

有一种比上述方法更简单的方法_mpc附带了一个方便的正则表达式函数，用于使用正则表达式语法构建解析器。我们可以使用正则表达式模式指定标识符，如下所示。

mpc_parser_t *ident = mpc_re("[a-zA-Z_][a-zA-Z_0-9]*");

/* Do Some Parsing... */

mpc_delete(ident);

Library Method

Although if we really wanted to create a parser for C identifiers, a function for creating this parser comes included in mpc along with many other common parsers.

虽然如果我们真的想为C标识符创建一个解析器，但_mpc_中包含了创建此解析器的函数以及许多其他常见的解析器。

mpc_parser_t *ident = mpc_ident();

/* Do Some Parsing... */

mpc_delete(ident);

Parser References

Building parsers in the above way can have issues with self-reference or cyclic-reference. To overcome this we can separate the construction of parsers into two different steps. Construction and Definition.

以上述方式构建解析器可能会出现自引用或循环引用的问题。为了克服这个问题，我们可以将解析器的构建分为两个不同的步骤。构造和定义。

mpc_parser_t *mpc_new(const char *name);

This will construct a parser called name which can then be used as input to others, including itself, without fear of being deleted. Any parser created using mpc_new is said to be retained. This means it will behave differently to a normal parser when referenced. When deleting a parser that includes a retained parser, the retained parser will not be deleted along with it. To delete a retained parser mpc_delete must be used on it directly.

这将构造一个名为name的解析器，然后可以将其用作其他人的输入，包括它自己，而不必担心被删除。任何使用mpc_new创建的解析器都被称为_retaind_。这意味着当被引用时，它的行为将与普通解析器不同。删除包含_retained_解析器的解析器时，retained 解析器不会与其一起删除。要删除保留的解析器，必须直接对其使用 mpc_delete。

A retained parser can then be defined using...

然后，可以使用…对_retaind_解析器进行_defined_。。。

mpc_parser_t *mpc_define(mpc_parser_t *p, mpc_parser_t *a);

This assigns the contents of parser a to p, and deletes a. With this technique parsers can now reference each other, as well as themselves, without trouble.

这将解析器a的内容分配给p，并删除a。通过这种技术，解析器现在可以毫无困难地相互引用，也可以引用自己。

mpc_parser_t *mpc_undefine(mpc_parser_t *p);

A final step is required. Parsers that reference each other must all be undefined before they are deleted. It is important to do any undefining before deletion. The reason for this is that to delete a parser it must look at each sub-parser that is used by it. If any of these have already been deleted a segfault is unavoidable - even if they were retained beforehand.

需要最后一步。相互引用的解析器在删除之前必须全部未定义。在删除之前进行任何未定义的操作都很重要。这样做的原因是，要删除一个解析器，它必须查看它使用的每个子解析器。如果其中任何一个子解析器已经被删除，那么segfault是不可避免的，即使它们事先被保留了。

void mpc_cleanup(int n, ...);

To ease the task of undefining and then deleting parsers mpc_cleanup can be used. It takes n parsers as input, and undefines them all, before deleting them all.

为了简化定义和删除解析器的任务，可以使用mpc_cleap。它接收n个解析器作为输入，并在删除它们之前取消定义它们。

mpc_parser_t *mpc_copy(mpc_parser_t *a);

This function makes a copy of a parser a. This can be useful when you want to use a parser as input for some other parsers multiple times without retaining it.

此函数生成解析器a的副本。当您想多次使用解析器作为其他解析器的输入而不保留它时，这可能很有用。

mpc_parser_t *mpc_re(const char *re);
mpc_parser_t *mpc_re_mode(const char *re, int mode);

This function takes as input the regular expression re and builds a parser for it. With the mpc_re_mode function optional mode flags can also be given.

此函数将正则表达式re作为输入，并为其构建解析器。使用mpc_re_mode函数，还可以给出可选的模式标志。

Available flags are MPC_RE_MULTILINE / MPC_RE_M where the start of input character ^ also matches the beginning of new lines and the end of input $ character also matches new lines, and MPC_RE_DOTALL / MPC_RE_S where the any character token . also matches newlines (by default it doesn't).

可用的标志是MPC_RE_MULTILINE / MPC_RE_M，其中输入字符^的开头也与新行的开头匹配，输入字符$的结尾也与新行匹配，以及MPC_RE_DOTALL / MPC_RE_S，其中包含任何字符标记.也匹配换行符（默认情况下不匹配）。

Library Reference 库参考

Common Parsers 常见解析器

`mpc_soi`	Matches only the start of input, returns `NULL`
	仅匹配输入的开头,返回 `NULL`
`mpc_eoi`	Matches only the end of input, returns `NULL`
	仅匹配输入的结尾,返回 `NULL`
`mpc_boundary`	Matches only the boundary between words, returns `NULL`
	仅匹配单词之间的边界,返回 `NULL`
`mpc_boundary_newline`	Matches the start of a new line, returns `NULL`
	匹配新行的开头,返回 `NULL`
`mpc_whitespace`	Matches any whitespace character `" \f\n\r\t\v"`
	匹配任意的空白字符`" \f\n\r\t\v"`
`mpc_whitespaces`	Matches zero or more whitespace characters
	匹配零个或多个空白字符
`mpc_blank`	Matches whitespaces and frees the result, returns `NULL`
	匹配并释空白字符放结果,返回 `NULL`
`mpc_newline`	Matches `'\n'`
	匹配 `'\n'`
`mpc_tab`	Matches `'\t'`
	匹配 `'\t'`
`mpc_escape`	Matches a backslash followed by any character
	匹配反斜杠后跟任何字符
`mpc_digit`	Matches any character in the range `'0'` - `'9'`
	匹配 `'0'` - `'9'`之间的任意字符
`mpc_hexdigit`	Matches any character in the range `'0` - `'9'` as well as `'A'` - `'F'` and `'a'` - `'f'`
	匹配16进制的字符
`mpc_octdigit`	Matches any character in the range `'0'` - `'7'`
	匹配8进制的字符
`mpc_digits`	Matches one or more digit
	匹配1个或多个数字
`mpc_hexdigits`	Matches one or more hexdigit
	匹配1个或多个16进制数字
`mpc_octdigits`	Matches one or more octdigit
	匹配1个或多个8进制数字
`mpc_lower`	Matches any lower case character
	匹配任意小写字符
`mpc_upper`	Matches any upper case character
	匹配任意大写字符
`mpc_alpha`	Matches any alphabet character
	匹配任意字母表字符
`mpc_underscore`	Matches `'_'`
	匹配 `'_'`
`mpc_alphanum`	Matches any alphabet character, underscore or digit
	匹配字母表 `'_'` 和数字
`mpc_int`	Matches digits and returns an `int*`
	匹配数字,返回 `int*`
`mpc_hex`	Matches hexdigits and returns an `int*`
	匹配16进制,返回 `int*`
`mpc_oct`	Matches octdigits and returns an `int*`
	匹配8进制,返回 `int*`
`mpc_number`	Matches `mpc_int`, `mpc_hex` or `mpc_oct`
	匹配 `mpc_int`, `mpc_hex` or `mpc_oct`
`mpc_real`	Matches some floating point number as a string
	将某个浮点数匹配为字符串
`mpc_float`	Matches some floating point number and returns a `float*`
	匹配浮点数,返回 `float*`
`mpc_char_lit`	Matches some character literal surrounded by `'`
	匹配由 `'`包围的字符
`mpc_string_lit`	Matches some string literal surrounded by `"`
	匹配由 `"`包围的字符串
`mpc_regex_lit`	Matches some regex literal surrounded by `/`
	匹配一些被`/`包围的正则表达式文字
`mpc_ident`	Matches a C style identifier
	匹配C样式标识符

Useful Parsers 有用的解析器

`mpc_startswith(mpc_parser_t *a);`	Matches the start of input followed by `a`
	匹配开头
`mpc_endswith(mpc_parser_t *a, mpc_dtor_t da);`	Matches `a` followed by the end of input
	匹配结尾
`mpc_whole(mpc_parser_t *a, mpc_dtor_t da);`	Matches the start of input, `a`, and the end of input
	匹配开头和结尾
`mpc_stripl(mpc_parser_t *a);`	Matches `a` first consuming any whitespace to the left
	删除左边的空白字符
`mpc_stripr(mpc_parser_t *a);`	Matches `a` then consumes any whitespace to the right
	删除右边的空白字符
`mpc_strip(mpc_parser_t *a);`	Matches `a` consuming any surrounding whitespace
	删除周围的空白字符
`mpc_tok(mpc_parser_t *a);`	Matches `a` and consumes any trailing whitespace
	删除尾随的空白字符
`mpc_sym(const char *s);`	Matches string `s` and consumes any trailing whitespace
	匹配字符串，然后删除尾随的空白字符
`mpc_total(mpc_parser_t *a, mpc_dtor_t da);`	Matches the whitespace consumed `a`, enclosed in the start and end of input
	匹配空白字符删除 `a`, 包含在输入的开头和结尾
`mpc_between(mpc_parser_t a, mpc_dtor_t ad, const char o, const char *c);`	Matches `a` between strings `o` and `c`
	匹配 `a` 在字符串 `o` 和 `c`之间
`mpc_parens(mpc_parser_t *a, mpc_dtor_t ad);`	Matches `a` between `"("` and `")"`
	匹配小括号
`mpc_braces(mpc_parser_t *a, mpc_dtor_t ad);`	Matches `a` between `"<"` and `">"`
	匹配尖括号
`mpc_brackets(mpc_parser_t *a, mpc_dtor_t ad);`	Matches `a` between `"{"` and `"}"`
	匹配大括号
`mpc_squares(mpc_parser_t *a, mpc_dtor_t ad);`	Matches `a` between `"["` and `"]"`
	匹配中括号
`mpc_tok_between(mpc_parser_t a, mpc_dtor_t ad, const char o, const char *c);`	Matches `a` between `o` and `c`, where `o` and `c` have their trailing whitespace striped.
	匹配 `o`和 `c`之间的 `a`，其中 `o`和 `c`的尾部空格被剥离。
`mpc_tok_parens(mpc_parser_t *a, mpc_dtor_t ad);`	Matches `a` between trailing whitespace consumed `"("` and `")"`
	匹配`"("` 和 `")"`之间的尾随空格
`mpc_tok_braces(mpc_parser_t *a, mpc_dtor_t ad);`	Matches `a` between trailing whitespace consumed `"<"` and `">"`
	匹配`"<>"` 和 `">"`之间的尾随空格
`mpc_tok_brackets(mpc_parser_t *a, mpc_dtor_t ad);`	Matches `a` between trailing whitespace consumed `"{"` and `"}"`
	匹配`"{"` 和 `"}"`之间的尾随空格
`mpc_tok_squares(mpc_parser_t *a, mpc_dtor_t ad);`	Matches `a` between trailing whitespace consumed `"["` and `"]"`
	匹配`"["` 和 `"]"`之间的尾随空格

Apply Functions 应用函数

`void mpcf_dtor_null(mpc_val_t *x);`	Empty destructor. Does nothing
	空白析构函数什么都不做
`mpc_val_t *mpcf_ctor_null(void);`	Returns `NULL`
	返回 `NULL`
`mpc_val_t *mpcf_ctor_str(void);`	Returns `""`
	返回 `""`
`mpc_val_t mpcf_free(mpc_val_t x);`	Frees `x` and returns `NULL`
	释放 `x` 然后返回 `NULL`
`mpc_val_t mpcf_int(mpc_val_t x);`	Converts a decimal string `x` to an `int*`
	转换10进制字符串 `x` 为 `int*`
`mpc_val_t mpcf_hex(mpc_val_t x);`	Converts a hex string `x` to an `int*`
	转换16进制字符串 `x` 为 `int*`
`mpc_val_t mpcf_oct(mpc_val_t x);`	Converts a oct string `x` to an `int*`
	转换8进制字符串 `x` 为 `int*`
`mpc_val_t mpcf_float(mpc_val_t x);`	Converts a string `x` to a `float*`
	转换字符串 `x` 为 `float*`
`mpc_val_t mpcf_escape(mpc_val_t x);`	Converts a string `x` to an escaped version
	转换字符串 `x` 为转义版本
`mpc_val_t mpcf_escape_regex(mpc_val_t x);`	Converts a regex `x` to an escaped version
	转换正则表达式`x` 为转义版本
`mpc_val_t mpcf_escape_string_raw(mpc_val_t x);`	Converts a raw string `x` to an escaped version
	转换原始字符串 `x` 为转义版本
`mpc_val_t mpcf_escape_char_raw(mpc_val_t x);`	Converts a raw character `x` to an escaped version
	转换原始字符 `x` 为转义版本
`mpc_val_t mpcf_unescape(mpc_val_t x);`	Converts a string `x` to an unescaped version
	转换字符串 `x` 为未转义版本
`mpc_val_t mpcf_unescape_regex(mpc_val_t x);`	Converts a regex `x` to an unescaped version
	转换正则表达式 `x` 为未转义版本
`mpc_val_t mpcf_unescape_string_raw(mpc_val_t x);`	Converts a raw string `x` to an unescaped version
	转换原始字符串 `x` 为未转义版本
`mpc_val_t mpcf_unescape_char_raw(mpc_val_t x);`	Converts a raw character `x` to an unescaped version
	转换原始字符 `x` 为未转义版本
`mpc_val_t mpcf_strtriml(mpc_val_t x);`	Trims whitespace from the left of string `x`
	修剪字符串左侧的空白
`mpc_val_t mpcf_strtrimr(mpc_val_t x);`	Trims whitespace from the right of string `x`
	修剪字符串右侧的空格
`mpc_val_t mpcf_strtrim(mpc_val_t x);`	Trims whitespace from either side of string `x`
	修剪字符串两侧的空格

Fold Functions 折叠函数

`mpc_val_t mpcf_null(int n, mpc_val_t* xs);`	Returns `NULL`
	返回 `NULL`
`mpc_val_t mpcf_fst(int n, mpc_val_t* xs);`	Returns first element of `xs`
	返回第一个元素
`mpc_val_t mpcf_snd(int n, mpc_val_t* xs);`	Returns second element of `xs`
	返回第二个元素
`mpc_val_t mpcf_trd(int n, mpc_val_t* xs);`	Returns third element of `xs`
	返回第三个元素
`mpc_val_t mpcf_fst_free(int n, mpc_val_t* xs);`	Returns first element of `xs` and calls `free` on others
	返回第一个元素并释放其他
`mpc_val_t mpcf_snd_free(int n, mpc_val_t* xs);`	Returns second element of `xs` and calls `free` on others
	返回第二个元素并释放其他
`mpc_val_t mpcf_trd_free(int n, mpc_val_t* xs);`	Returns third element of `xs` and calls `free` on others
	返回第三个元素并释放其他
`mpc_val_t mpcf_all_free(int n, mpc_val_t* xs);`	Calls `free` on all elements of `xs` and returns `NULL`
	释放所有元素并返回`NULL`
`mpc_val_t mpcf_strfold(int n, mpc_val_t* xs);`	Concatenates all `xs` together as strings and returns result
	将所有`xs`连接在一起作为字符串并返回结果

Case Study - Maths Language 案例研究 - 数学语言

Combinator Approach 组合方法

Passing around all these function pointers might seem clumsy, but having parsers be type-generic is important as it lets users define their own output types for parsers. For example we could design our own syntax tree type to use. We can also use this method to do some specific house-keeping or data processing in the parsing phase.

传递所有这些函数指针可能看起来很笨拙，但让解析器具有类型泛型很重要，因为它允许用户为解析器定义自己的输出类型。例如，我们可以设计自己的语法树类型来使用。我们还可以在解析阶段使用这种方法进行一些特定的内务管理或数据处理。

As an example of this power, we can specify a simple maths grammar, that outputs int *, and computes the result of the expression as it goes along.

作为这种能力的一个例子，我们可以指定一个简单的数学语法，输出int*，并计算表达式的结果。

We start with a fold function that will fold two int * into a new int * based on some char * operator.

我们从一个fold函数开始，该函数将根据一些char*运算符将两个int*折叠成一个新的int*。

mpc_val_t *fold_maths(int n, mpc_val_t **xs) {

  int **vs = (int**)xs;

  if (strcmp(xs[1], "*") == 0) { *vs[0] *= *vs[2]; }
  if (strcmp(xs[1], "/") == 0) { *vs[0] /= *vs[2]; }
  if (strcmp(xs[1], "%") == 0) { *vs[0] %= *vs[2]; }
  if (strcmp(xs[1], "+") == 0) { *vs[0] += *vs[2]; }
  if (strcmp(xs[1], "-") == 0) { *vs[0] -= *vs[2]; }

  free(xs[1]); free(xs[2]);

  return xs[0];
}

And then we use this to specify a basic grammar, which folds together any results.

然后我们用它来指定一个基本语法，它将任何结果折叠在一起。

mpc_parser_t *Expr   = mpc_new("expr");
mpc_parser_t *Factor = mpc_new("factor");
mpc_parser_t *Term   = mpc_new("term");
mpc_parser_t *Maths  = mpc_new("maths");

mpc_define(Expr, mpc_or(2,
  mpc_and(3, fold_maths,
    Factor, mpc_oneof("+-"), Factor,
    free, free),
  Factor
));

mpc_define(Factor, mpc_or(2,
  mpc_and(3, fold_maths,
    Term, mpc_oneof("*/"), Term,
    free, free),
  Term
));

mpc_define(Term, mpc_or(2, mpc_int(), mpc_parens(Expr, free)));
mpc_define(Maths, mpc_whole(Expr, free));

/* Do Some Parsing... */

mpc_delete(Maths);

If we supply this function with something like (4*2)+5, we can expect it to output 13.

如果我们为这个函数提供类似于(4*2)+5的东西，我们可以期望它输出13。

Language Approach 语言方法

It is possible to avoid passing in and around all those function pointers, if you don't care what type is output by mpc. For this, a generic Abstract Syntax Tree type mpc_ast_t is included in mpc. The combinator functions which act on this don't need information on how to destruct or fold instances of the result as they know it will be a mpc_ast_t. So there are a number of combinator functions which work specifically (and only) on parsers that return this type. They reside under mpca_*.

如果你不在乎_mpc_输出什么类型，可以避免传入和绕过所有这些函数指针。为此，_mpc_中包含了一个通用的抽象语法树类型mpc_ast_t。作用于此的组合子函数不需要关于如何销毁或折叠结果实例的信息，因为它们知道这将是一个mpc_ast_t。因此，有许多组合子函数专门（且仅）在返回此类型的解析器上工作。它们使用mpca_*类型的名称。

Doing things via this method means that all the data processing must take place after the parsing. In many instances this is not an issue, or even preferable.

通过这种方法做事意味着所有的数据处理都必须在解析后进行。在许多情况下，这不是问题，甚至更可取。

It also allows for one more trick. As all the fold and destructor functions are implicit, the user can simply specify the grammar of the language in some nice way and the system can try to build a parser for the AST type from this alone. For this there are a few functions supplied which take in a string, and output a parser. The format for these grammars is simple and familiar to those who have used parser generators before. It looks something like this.

它还允许再耍一个花招。由于所有的fold和析构函数函数都是隐式的，用户可以简单地以某种好的方式指定语言的语法，系统可以尝试仅凭此为AST类型构建解析器。为此，提供了一些函数，它们接收字符串并输出解析器。这些语法的格式对于那些以前使用过解析器生成器的人来说是简单而熟悉的。它看起来像这样。

number "number" : /[0-9]+/ ;
expression      : <product> (('+' | '-') <product>)* ;
product         : <value>   (('*' | '/')   <value>)* ;
value           : <number> | '(' <expression> ')' ;
maths           : /^/ <expression> /$/ ;

The syntax for this is defined as follows.

其语法定义如下。

`"ab"`	The string `ab` is required.
`'a'`	The character `a` is required.
`'a' 'b'`	First `'a'` is required, then `'b'` is required..
`'a' \| 'b'`	Either `'a'` is required, or `'b'` is required.
`'a'*`	Zero or more `'a'` are required.
`'a'+`	One or more `'a'` are required.
`'a'?`	Zero or one `'a'` is required.
`'a'{x}`	Exactly `x` (integer) copies of `'a'` are required.
`<abba>`	The rule called `abba` is required.

Rules are specified by rule name, optionally followed by an expected string, followed by a colon :, followed by the definition, and ending in a semicolon ;. Multiple rules can be specified. The rule names must match the names given to any parsers created by mpc_new, otherwise the function will crash.

规则由规则名称指定，后面可选地跟有_expected_字符串，后跟冒号:，后跟定义，最后以分号;结尾。可以指定多个规则。_rule names_必须与mpc_new创建的任何解析器的名称匹配，否则函数将崩溃。

The flags variable is a set of flags MPCA_LANG_DEFAULT, MPCA_LANG_PREDICTIVE, or MPCA_LANG_WHITESPACE_SENSITIVE. For specifying if the language is predictive or whitespace sensitive.

标志变量是一组标志MPCA_LANG_DEFAULT、MPCA_LONG_PREDICTIVE或MPCA_ANG_WHITESPACE_SENSITIVE。用于指定语言是预测性的还是空格敏感的。

Like with the regular expressions, this user input is parsed by existing parts of the mpc library. It provides one of the more powerful features of the library.

与正则表达式一样，此用户输入由_mpc_库的现有部分解析。它是库更强大的原因之一。

mpc_parser_t *mpca_grammar(int flags, const char *grammar, ...);

This takes in some single right hand side of a rule, as well as a list of any of the parsers referenced, and outputs a parser that does what is specified by the rule. The list of parsers referenced can be terminated with NULL to get an error instead of a crash when a parser required is not supplied.

这接收规则的某个右侧，以及引用的任何解析器的列表，并输出一个执行规则指定操作的解析器。当没有提供所需的解析器时，引用的解析器列表可以用NULL终止，以获得错误而不是崩溃。

mpc_err_t *mpca_lang(int flags, const char *lang, ...);

This takes in a full language (zero or more rules) as well as any parsers referred to by either the right or left hand sides. Any parsers specified on the left hand side of any rule will be assigned a parser equivalent to what is specified on the right. On valid user input this returns NULL, while if there are any errors in the user input it will return an instance of mpc_err_t describing the issues. The list of parsers referenced can be terminated with NULL to get an error instead of a crash when a parser required is not supplied.

这需要一个完整的语言（零个或多个规则）以及右侧或左侧引用的任何解析器。在任何规则的左侧指定的任何解析器都将被分配一个与右侧指定的解析器等效的解析器。在有效的用户输入时，这将返回NULL，而如果用户输入中有任何错误，它将返回一个描述问题的mpc_err_t实例。当没有提供所需的解析器时，引用的解析器列表可以用NULL终止，以获得错误而不是崩溃。

mpc_err_t *mpca_lang_file(int flags, FILE* f, ...);

This reads in the contents of file f and inputs it into mpca_lang.

这将读取文件f的内容并将其输入到mpca_lang中。

mpc_err_t *mpca_lang_contents(int flags, const char *filename, ...);

This opens and reads in the contents of the file given by filename and passes it to mpca_lang.

这将打开并读取由filename给出的文件内容，并将其传递给mpca_lang。

Case Study - Tokenizer 案例研究 - 分词器

Another common task we might be interested in doing is tokenizing some block of text (splitting the text into individual elements) and performing some function on each one of these elements as it is read. We can do this with mpc too.

我们可能感兴趣的另一个常见任务是标记一些文本块（将文本拆分为单个元素），并在读取时对每个元素执行一些功能。我们也可以用mpc来实现这一点。

First, we can build a regular expression which parses an individual token. For example if our tokens are identifiers, integers, commas, periods and colons we could build something like this mpc_re("\\s*([a-zA-Z_]+|[0-9]+|,|\\.|:)").

首先，我们可以构建一个解析单个令牌的正则表达式。例如，如果我们的标记是标识符、整数、逗号、句点和冒号，我们可以构建这样的mpc_re("\\s*([a-zA-Z_]+|[0-9]+|,|\\.|:)")。

Next we can strip any whitespace, and add a callback function using mpc_apply which gets called every time this regex is parsed successfully mpc_apply(mpc_strip(mpc_re("\\s*([a-zA-Z_]+|[0-9]+|,|\\.|:)")), print_token).

接下来，我们可以去掉任何空格，并使用mpc_apply添加一个回调函数，每次成功解析此正则表达式时都会调用mpc_apply(mpc_strip(mpc_re("\\s*([a-zA-Z_]+|[0-9]+|,|\\.|:)")), print_token)。

Finally we can surround all of this in mpc_many to parse it zero or more times. The final code might look something like this:

最后，我们可以将所有这些放在“mpc_mony”中，对其进行零次或多次解析。最终的代码可能看起来像这样：

static mpc_val_t *print_token(mpc_val_t *x) {
  printf("Token: '%s'\n", (char*)x);
  return x;
}

int main(int argc, char **argv) {

  const char *input = "  hello 4352 ,  \n foo.bar   \n\n  test:ing   ";

  mpc_parser_t* Tokens = mpc_many(
    mpcf_all_free,
    mpc_apply(mpc_strip(mpc_re("\\s*([a-zA-Z_]+|[0-9]+|,|\\.|:)")), print_token));

  mpc_result_t r;
  mpc_parse("input", input, Tokens, &r);

  mpc_delete(Tokens);

  return 0;
}

Running this program will produce an output something like this:

运行此程序将产生如下输出：

Token: 'hello'
Token: '4352'
Token: ','
Token: 'foo'
Token: '.'
Token: 'bar'
Token: 'test'
Token: ':'
Token: 'ing'

By extending the regex we can easily extend this to parse many more types of tokens and quickly and easily build a tokenizer for whatever language we are interested in.

通过扩展正则表达式，我们可以轻松地扩展它来解析更多类型的标记，并快速轻松地为我们感兴趣的任何语言构建标记器。

Error Reporting 错误报告

mpc provides some automatic generation of error messages. These can be enhanced by the user, with use of mpc_expect, but many of the defaults should provide both useful and readable. An example of an error message might look something like this:

_mpc提供了一些自动生成错误消息的功能。用户可以通过使用mpc_expect来增强这些功能，但许多默认值应该既有用又可读。错误消息的示例可能如下：

<test>:0:3: error: expected one or more of 'a' or 'd' at 'k'

Misc 杂项

Here are some other misc functions that mpc provides. These functions are susceptible to change between versions so use them with some care.

以下是mpc提供的一些其他杂项功能。这些功能很容易在版本之间发生变化，因此请谨慎使用。

void mpc_print(mpc_parser_t *p);

Prints out a parser in some weird format. This is generally used for debugging so don't expect to be able to understand the output right away without looking at the source code a little bit.

以某种奇怪的格式打印出解析器。这通常用于调试，所以不要指望在不看一点源代码的情况下就能立即理解输出。

void mpc_stats(mpc_parser_t *p);

Prints out some basic stats about a parser. Again used for debugging and optimisation.

打印出一些关于解析器的基本统计数据。再次用于调试和优化。

void mpc_optimise(mpc_parser_t *p);

Performs some basic optimisations on a parser to reduce it's size and increase its running speed.

对解析器执行一些基本优化，以减小其大小并提高其运行速度。

Limitations & FAQ 限制和常见问题

I'm getting namespace issues due to `libmpc`, what can I do?

There is a re-naming of this project to pcq hosted on the pcq branch which should be usable without namespace issues.

Does mpc support Unicode?

mpc Only supports ASCII. Sorry! Writing a parser library that supports Unicode is pretty difficult. I welcome contributions!

Is mpc binary safe?

No. Sorry! Including NULL characters in a string or a file will probably break it. Avoid this if possible.

The Parser is going into an infinite loop!

While it is certainly possible there is an issue with mpc, it is probably the case that your grammar contains left recursion. This is something mpc cannot deal with. Left recursion is when a rule directly or indirectly references itself on the left hand side of a derivation. For example consider this left recursive grammar intended to parse an expression.

expr : <expr> '+' (<expr> | <int> | <string>);

When the rule expr is called, it looks the first rule on the left. This happens to be the rule expr again. So again it looks for the first rule on the left. Which is expr again. And so on. To avoid left recursion this can be rewritten (for example) as the following. Note that rewriting as follows also changes the operator associativity.

value : <int> | <string> ;
expr  : <value> ('+' <expr>)* ;

Avoiding left recursion can be tricky, but is easy once you get a feel for it. For more information you can look on wikipedia which covers some common techniques and more examples. Possibly in the future mpc will support functionality to warn the user or re-write grammars which contain left recursion, but it wont for now.

Backtracking isn't working!

mpc supports backtracking, but it may not work as you expect. It isn't a silver bullet, and you still must structure your grammar to be unambiguous. To demonstrate this behaviour examine the following erroneous grammar, intended to parse either a C style identifier, or a C style function call.

factor : <ident>
       | <ident> '('  <expr>? (',' <expr>)* ')' ;

This grammar will never correctly parse a function call because it will always first succeed parsing the initial identifier and return a factor. At this point it will encounter the parenthesis of the function call, give up, and throw an error. Even if it were to try and parse a factor again on this failure it would never reach the correct function call option because it always tries the other options first, and always succeeds with the identifier.

The solution to this is to always structure grammars with the most specific clause first, and more general clauses afterwards. This is the natural technique used for avoiding left-recursive grammars and unambiguity, so is a good habit to get into anyway.

Now the parser will try to match a function first, and if this fails backtrack and try to match just an identifier.

factor : <ident> '('  <expr>? (',' <expr>)* ')'
       | <ident> ;

An alternative, and better option is to remove the ambiguity completely by factoring out the first identifier. This is better because it removes any need for backtracking at all! Now the grammar is predictive!

factor : <ident> ('('  <expr>? (',' <expr>)* ')')? ;

How can I avoid the maximum string literal length?

Some compilers limit the maximum length of string literals. If you have a huge language string in the source file to be passed into mpca_lang you might encounter this. The ANSI standard says that 509 is the maximum length allowed for a string literal. Most compilers support greater than this. Visual Studio supports up to 2048 characters, while gcc allocates memory dynamically and so has no real limit.

There are a couple of ways to overcome this issue if it arises. You could instead use mpca_lang_contents and load the language from file or you could use a string literal for each line and let the preprocessor automatically concatenate them together, avoiding the limit. The final option is to upgrade your compiler. In C99 this limit has been increased to 4095.

The automatic tags in the AST are annoying!

When parsing from a grammar, the abstract syntax tree is tagged with different tags for each primitive type it encounters. For example a regular expression will be automatically tagged as regex. Character literals as char and strings as string. This is to help people wondering exactly how they might need to convert the node contents.

If you have a rule in your grammar called string, char or regex, you may encounter some confusion. This is because nodes will be tagged with (for example) string either if they are a string primitive, or if they were parsed via your string rule. If you are detecting node type using something like strstr, in this situation it might break. One solution to this is to always check that string is the innermost tag to test for string primitives, or to rename your rule called string to something that doesn't conflict.

Yes it is annoying but its probably not going to change!

README.md Unescape Escape

Micro Parser Combinators

About

Features

Alternatives

Quickstart

Getting Started

Introduction

Basic Parsers

String Parsers

Other Parsers

Parsing

Combinators

Function Types

Case Study - Identifier

Combinator Method

Regex Method

Library Method

Parser References

Library Reference 库参考

Common Parsers 常见解析器

Useful Parsers 有用的解析器

Apply Functions 应用函数

Fold Functions 折叠函数

Case Study - Maths Language 案例研究 - 数学语言

Combinator Approach 组合方法

Language Approach 语言方法

Case Study - Tokenizer 案例研究 - 分词器

Error Reporting 错误报告

Misc 杂项

Limitations & FAQ 限制和常见问题

I'm getting namespace issues due to libmpc, what can I do?

Does mpc support Unicode?

Is mpc binary safe?

The Parser is going into an infinite loop!

Backtracking isn't working!

How can I avoid the maximum string literal length?

The automatic tags in the AST are annoying!

README.md

I'm getting namespace issues due to `libmpc`, what can I do?