Skip to content

commonform/commonform-markup-parse

Repository files navigation

Parse Common Form markup, returning an object containing a form and an array of path-to-blank mappings.

Markup

Common Form markup utilizes rarely-used symbols you can type with your keyboard to structure agreements and indicate definitions, uses of defined terms, fill-in-the-blanks, and cross-references between provisions. For example:

This Agreement (this ""Agreement"") is made effective as of
[Effective Date] by and between [Seller's Legal Name] (""Seller"") and
[Buyer's Legal Name] (""Buyer"").

    \Definitions\  For purposes of this <Agreement>, the following terms
have the following meanings:

        \\  ""Capital Stock"" means the capital stock of the Company,
    including, without limitation, the <Common Stock> and the <Preferred
    Stock>.

        \\  ""Dissolution Event"" means:

            \\  a voluntary termination of operations pursuant to {Voluntary
    Shutdown};

            \\  a general assignment for the benefit of the <Company>'s
    creditors or

            \\  any other liquidation, dissolution or winding up of the
    <Company> (excluding a <Liquidity Event>), whether voluntary or
    involuntary.

Each subdivision of the form begins with \\, indented by four spaces. If the provision has a heading, it goes within the slashes, like \Definitions\ ....

Within a provision, terms being defined are set in ""double quotation marks"". Defined terms being used are typed <within angle brackets>. A cross-reference to a provision with a {Particular Heading} is with braces. [Blanks to be filled in] use square brackets.

The Parser

var parse = require('commonform-markup-parse')
parse(stringOfMarkup); // => {form: Object, directions: Array}

The parser is made of several components:

  1. a hand-coded context-tracking tokenizer (or lexer) that emits tokens for indentation and outdentation, in addition to content tokens

  2. an LALR(1) parser generated by Jison from a Bison-like BNF grammar

  3. commonform-fix-strings to convert the parser's AST to a valid common form by fixing various string-related validation issues, like extra space

  4. a tiny algorithm that removes the hints text from fill-in-the-blanks within the form, and emits path-to-hint mappings instead

The parser passes the commonform-markup-tests test suite.

If you'd like to write a parser in a different language, the test suite and this package are best places to start. Your language probably already has a Bison clone or a BNF-compatible parser combinators library.