Skip to content

Latest commit

 

History

History
677 lines (468 loc) · 29.9 KB

README.org

File metadata and controls

677 lines (468 loc) · 29.9 KB

The home of the L4 DSL

Hi! If you aren’t a L4 internal developer, or if you aren’t already familiar with L4, you might want to look at LegalSS first. This documentation is primarily intended for core devs – programmers working on the compiler toolchain and IDE for L4 itself.

L4 Language Family

The L4 language family currently comprises:

a compiler for the developer-friendly Natural L4 syntax, in the directory lib/dsl/haskell/natural4

The syntax uses a plethora of English keywords arranged in a sentence-like structure, similar to languages like SQL.

The current IDE is a spreadsheet environment. In the future, we may add support for text-mode IDEs if there is demand.

The compiler consists of a parser, interpreter, and transpiler to a variety of downstream targets.

a semantically rigorous Core L4 syntax which lives in the repo smucclaw/baby-l4

Please see https://github.com/smucclaw/baby-l4 and the L4 Language Report.

Transpilation Targets

L4 transpiles to the following targets. An L4 “program” can be exported to these formats:

TargetPurposeStatus (Dec 2022)
Purescriptdecision logicstable
Haskellexpose AST & IR internalsstable
Org-Modeexpose AST & IR internalsstable
Epilogdecision logicearly release
ASP (Answer Set Programming)abductionearly release
Markdownhuman readabilityearly release
Petri Netsworkflows and verificationearly release
Typescriptdata schema as class typesearly release
JSONinstance data valuesearly release
Prologdecision logicin progress
Pro Bformal verificationin progress
DMNdecision logicin progress
mCRL2formal verificationevaluation
Uppaalformal verificationevaluation
Tapaalformal verificationevaluation
BPMNworkflows for executionon roadmap
DocAssembledocument assemblyon roadmap
Cataladecision logicon roadmap
Blawxdecision logic, ontologyon roadmap
MorphIRdecision logic, ontologyon roadmap
Wordhuman readabilityon roadmap
PDFhuman readability, signingon roadmap
LegalDocumentMLstandards-based interopon roadmap
Alloyformal verificationon roadmap

After a successful run of the natural4 interpreter you will find exports udner the workdir directory.

Natural L4

Spreadsheet Interface

LegalSS contains examples of L4 “source code”.

L4 developers – the legal engineers working with L4 source code to build legal infrastructure and applications – are supported by a sidebar in the spreadsheet that displays warnings, errors, and exported files.

That sidebar is generated by the L4 toolchain, which makes sense of L4 source code from the spreadsheet.

That L4 toolchain is, in turn, supported by the L4 internal developer community. We also refer to internal developers as core developers.

Operating instructions for the parts of the toolchain that bring up a web-based development environment (Google Sheets) and web application (end-user-facing Vue app) are available at https://docs.google.com/document/d/1EvyiQhSgapumBRt9UloRpwiRcgVhF-m65FVdAz3chfs/edit

The rest of this documentation introduces the logic and semantics of natural4, its internal data structures, and its surface syntax.

Along the way it discusses the internal structure of our compiler codebase.

Compiler Toolchain

Prerequisites

  1. Clone this repository.
  2. (For Nix users only:) You will have your own setup universe which we won’t go into here, but if you use Nix you already probably have Haskell well in hand. You are using Nix if your $PATH contains $HOME/.nix-profile/bin/
  3. If you’re not a Nix user, install GHCup.
  4. After installing GHCup, your $PATH should contain $HOME/.ghcup/bin/
  5. Use GHCup to install Stack by running ghcup tui.
  6. After installing Stack, your $PATH should contain $HOME/.local/bin/
  7. For VS Code and emacs users: use GHCup to install HLS by manually running
    ghcup compile hls --git-ref master --ghc 9.2.5 --cabal-update
        
  8. For Emacs users: proceed to the instructions at lsp-haskell.
  9. For VS Code users: proceed to the instructions at vscode-haskell.

Installation

After your Haskell setup is working well enough to run stack,

src/smucclaw/dsl$ cd lib/haskell/natural4
natural4$ stack test

It may take some time to compile ghc, plus a whole raft of dependencies.

Confirming the install

A number of sample CSV inputs can be found in the test/examples directory. (The following should be run from the lib/haskell/natural4 subdirectory.)

natural4$ stack run -- --workdir=workdir --toasp --tomd test/examples/mustsing-latest.csv

This should compile the l4 parser, and produce a bunch of output under the workdir/ directory.

You can browse that output using any text editor or filesystem explorer.

Install the binary

If you’re happy with the above output,

natural4$ stack install

will put a natural4-exe somewhere in your $HOME/.local/bin directory. After the above installation procedure, that directory should appear in your $PATH.

Generate Haddock documentation

natural4$ stack haddock

will generate Haddock documentation from the Haskell source. The output takes the form of .html files; pay attention to the last dozen lines or so of the stack haddock run output to get the actual pathnames.

Downloading the Spreadsheet

The typical user drafts an L4 program in a spreadsheet.

That program can be downloaded as a CSV, which this toolchain consumes.

Example: Go to the “Case Study: PDPA DBNO” tab in LegalSS, and click File/ Download / as CSV.

The downloaded filename will probably be quite long. For the sake of concision, we will call it pdpadbno.csv

Is there some other way to obtain the same CSV using only the command line? Yes. If you just want to use the stock CSV, you can do

wget -O pdpadbno.csv 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSbknuGFkvvk_Pv9e97iDW9BInxd-Cj27-xC8d_SwveP1PxERV7ZmiAIU7PWXas2CEMH2bl8PyyD6X3/pub?gid=0&single=true&output=csv'

Otherwise, if you want to make your own changes and get the resulting spreadsheet in CSV form, you should make a copy (File > Make a Copy in Google Sheets), if you haven’t already, and make your changes in that copy. Then, to get the URL for wget, click on File / Share / Publish to web, choose the appropriate tab under ‘Link’ and select CSV as the output format under ‘Embed’, and finally click on Publish. You will then be presented with a URL; this URL is what should replace that in the wget eaxmple block above.

Is this more or less the same file as what’s already present in test/casestudypdpadbno-latest.csv? Yes, it is. But now you know how to get a copy fresh from any edits you make.

Generating the Transpilation Targets in the Workdir

The job of a compiler / transpiler / parser / interpreter toolchain is to transform user code to some useful target representation.

We’ll first want to output all the target representations to disk. To do that, navigate to dsl/lib/haskell/natural4 and run

$ stack run -- --workdir=workdir --toasp --tomd pdpadbno.csv

The result will be something like:

$ tree workdir     
workdir
└── no-uuid
    ├── aasvg
    │   ├── 2022-12-01T08:56:17.579Z
    │   │   ├── Assessment-anyall.hs
    │   │   ├── Assessment-anyall.json
    │   │   ├── Assessment-full.svg
    │   │   ├── Assessment-qjson.json
    │   │   ├── Assessment-qtree.hs
    │   │   ├── Assessment-tiny.svg
    │   │   ├── Notify_Individuals-N-anyall.hs
    │   │   ├── Notify_Individuals-N-anyall.json
    │   │   ├── Notify_Individuals-N-full.svg
    │   │   ├── Notify_Individuals-N-qjson.json
    │   │   ├── Notify_Individuals-N-qtree.hs
    │   │   ├── Notify_Individuals-N-tiny.svg
    │   │   └── index.html
    │   ├── 2022-12-07T10:01:07.764Z
    │   │   ├── Assessment-anyall.hs
    │   │   ├── Assessment-anyall.json
    │   │   ├── Assessment-full.svg
    │   │   ├── Assessment-qjson.json
    │   │   ├── Assessment-qtree.hs
    │   │   ├── Assessment-tiny.svg
    │   │   ├── Notify_Individuals-N-anyall.hs
    │   │   ├── Notify_Individuals-N-anyall.json
    │   │   ├── Notify_Individuals-N-full.svg
    │   │   ├── Notify_Individuals-N-qjson.json
    │   │   ├── Notify_Individuals-N-qtree.hs
    │   │   ├── Notify_Individuals-N-tiny.svg
    │   │   └── index.html
    │   └── LATEST -> 2022-12-07T10:01:07.764Z
    ├── asp
    │   ├── 2022-12-01T08:55:44.907Z.lp
    │   └── 2022-12-07T09:35:17.752Z.lp
    ├── babyl4
    │   ├── 2022-12-07T10:01:07.764Z.l4
    │   └── LATEST.l4 -> 2022-12-07T10:01:07.764Z.l4
    ├── corel4
    │   ├── 2022-12-07T10:01:07.764Z.l4
    │   └── LATEST.l4 -> 2022-12-07T10:01:07.764Z.l4
    ├── grounds
    │   ├── 2022-12-07T10:01:07.764Z.txt
    │   └── LATEST.txt -> 2022-12-07T10:01:07.764Z.txt
    ├── json
    │   ├── 2022-12-07T10:01:07.764Z.json
    │   └── LATEST.json -> 2022-12-07T10:01:07.764Z.json
    ├── native
    │   ├── 2022-12-07T10:01:07.764Z.hs
    │   └── LATEST.hs -> 2022-12-07T10:01:07.764Z.hs
    ├── natlang
    │   ├── 2022-12-07T10:01:07.764Z.txt
    │   └── LATEST.txt -> 2022-12-07T10:01:07.764Z.txt
    ├── org
    │   ├── 2022-12-07T10:01:07.764Z.org
    │   └── LATEST.org -> 2022-12-07T10:01:07.764Z.org
    ├── petri
    │   ├── 2022-12-07T10:01:07.764Z.dot
    │   └── LATEST.dot -> 2022-12-07T10:01:07.764Z.dot
    ├── prolog
    │   ├── 2022-12-07T10:01:07.764Z.pl
    │   └── LATEST.pl -> 2022-12-07T10:01:07.764Z.pl
    ├── purs
    │   ├── 2022-12-07T10:01:07.764Z.purs
    │   └── LATEST.purs -> 2022-12-07T10:01:07.764Z.purs
    └── ts
        ├── 2022-12-07T10:01:07.764Z.ts
        └── LATEST.ts -> 2022-12-07T10:01:07.764Z.ts

These outputs correspond to the transpilation targets enumerated above, and can be further transformed if necessary before being put in front of the end-user’s eyeballs.

The above run disabled two of the output targets because they are still works in progress: --toasp --tomd

You can remove those options from the command line to enable those output channels, but they may not work.

The AST and Intermediate Representations

The initial AST produced by the natural4 parser is found in the native/ output directory as LATEST.hs.

Subsequently, that AST is transformed by the Interpreter module to a data structure output to the org/ output directory as LATEST.org.

From there, it is further transformed to an intermediate representation called CoreL4, also known as Baby L4. That transformation is currently under renovation: there is a plaintext code generator which is slightly older, and a slightly newer initiative to perform direct in-memory conversion from one set of types to another. In our defense, this high-level/low-level dichotomy is also present in GHC; and the old/new pair of transformation pathways also appears in GHC’s history.

native: a Haskell data structure

This is the simplest output mode. It confirms the parse happened as intended.

You should see a LATEST.hs file under the workdir/*/native/ output directory. You can get just that output by running

natural4$ stack run -- --only native test/casestudypdpadbno-latest.csv

This should produce a screenful of output. If all went well, the output will be in the format of a Haskell data structure, containing the rules that have been parsed. It will look something like this:

[ Regulative
    { every = "Person"
    , who = Just
        ( All
            ( Pre "Who" )
            [ Leaf "walks"
            , Any
                ( Pre "any of:" )
                [ Leaf "eats"
                , Leaf "drinks"
                ]
            ]
        )
    , cond = Nothing
    , deontic = DMust
    , action =
        ( "sing"
        , []
        )
    , temporal = Nothing
    , hence = Nothing
    , lest = Nothing
    , rlabel = Nothing
    , lsource = Nothing
    , srcref = Nothing
    , upon = Nothing
    , given = Nothing
    }
]
[ Regulative
    { every = "Person"
    , who = Just
        ( All
            ( Pre "Who" )
            [ Leaf "walks"
            , Any
                ( Pre "any of:" )
                [ Leaf "eats"
                , Leaf "drinks"
                ]
            ]
        )
    , cond = Nothing
    , deontic = DMust
    , action =
        ( "sing"
        , []
        )
    , temporal = Nothing
    , hence = Nothing
    , lest = Nothing
    , rlabel = Nothing
    , lsource = Nothing
    , srcref = Nothing
    , upon = Nothing
    , given = Nothing
    }
]
[ Constitutive
    { term = "The rule-level checkbox is checked"
    , cond = Just
        ( Any
            ( Pre "any of:" )
            [ Leaf "the conditions do not hold"
            , All
                ( Pre "all of:" )
                [ Leaf "the conditions do hold"
                , Leaf "the action is satisfied"
                ]
            ]
        )
    , rlabel = Nothing
    , lsource = Nothing
    , srcref = Nothing
    }
]
[ Regulative
    { every = "Person"
    , who = Just
        ( Leaf "Qualifies" )
    , cond = Nothing
    , deontic = DMust
    , action =
        ( "sing"
        , []
        )
    , temporal = Nothing
    , hence = Nothing
    , lest = Nothing
    , rlabel = Nothing
    , lsource = Nothing
    , srcref = Nothing
    , upon = Nothing
    , given = Nothing
    }
, Constitutive
    { term = "Qualifies"
    , cond = Just
        ( All
            ( Pre "all of:" )
            [ Leaf "walks"
            , Any
                ( Pre "any of:" )
                [ Leaf "eats"
                , Leaf "drinks"
                ]
            ]
        )
    , rlabel = Nothing
    , lsource = Nothing
    , srcref = Nothing
    }
]

The semantics of this data structure will be outlined below. This outline complements other documentation:

The following documentation focuses on the parsing and interpretation aspects of the toolchain and the data structures in particular.

org: an Org-Mode file showing Interpreter output

The interpreter performs a number of transformations. For your convenience it writes each stage of those transformations out to the org/LATEST.org output file, in a format compatible with org-mode.

If you are an Emacs user, you have native support for org-mode built in; opening the LATEST.org output file should automatically load org-mode.

If you are a VS Code user you will need to install the relevant extensions.

baby-l4 / core-l4

This is a formally rigorous language representing decision logic, ontology, and workflow rules.

This is where some of the type-checking and type inference occurs.

Some of the downstream targets are produced by transpilers which proceed from CoreL4.

For details, please see https://github.com/smucclaw/baby-l4 and the L4 Language Report.

Transpilation Targets In Detail

This section is still under construction. It will give a short explanation of each transpilation target and its intended use.

prolog

Those elements of L4 which correspond to first-order logic can be transpiled to a Prolog-like syntax.

The inspiration is The British Nationality Act as a Logic Program. In principle it should be possible to express the British Nationality Act as an L4 program, from which we can extract a Prolog program.

We usually see decision-related reasoning in the context of constitutive rules: an X counts as a Y if Z holds. In the trivial case, an X counts as true if Z holds:

constitutiveRule(X) :- Z.

bna(isBritishCitizen,X) :- bna(isBritishCitizen,Y), parent(Y, X).

The reasoning used is backward-chaining deduction. Given a goal statement, the task is to decide if that goal is true or false. “It depends.” What does it depend on? We work through all the dependencies, backtracking, until the goal can be proven true – or false.

Prolog’s other great strength – abductive reasoning through unification – does not arise in our current swipl implementation. We do want to use this feature for planning problems, in the future.

Our current transpilation pathway to Prolog actually runs through the CoreL4 language. It is possible that in future we will go direct from Natural4.

Within the Prolog family of targets we differentiate:

SWIPL
Clingo
s(CASP)

petri: a Petri Net showing the state diagram

One of the workdir outputs is a Petri (workflow) net in GraphViz dot format. This is like a state diagram.

To convert it to an SVG which can be opened in a web browser, you will need GraphViz installed. Then run

dot -Tsvg workdir/no-uuid/petri/LATEST.dot > workdir/no-uuid/petri/LATEST.svg

It is not exactly a state diagram because things can be in multiple sub-states at once. Petri Nets are good at showing that.

aasvg: AnyAll SVG showing the decision logic

json: for consumption by other tools such as a web app

Purescript

This is consumed by the Vue web application documented in the Operating Instructions Google Doc.

If you know Haskell, you will be able to read Purescript.

Debugging

Sometimes, a downloaded CSV may not agree with the parser.

If a parse error occurs, you can enable debugging by adding --dbug to the command line. An alternative way to enable debugging is to set the environment variable MP_DEBUG=True.

Debugging output is super verbose. We process it with the following idiom:

filename=casestudypdpadbno-latest; MP_DEBUG=True stack run test/$filename.csv |& bin/debug2org > out/$filename.org

The debug2org script rewrites the raw debugging output into a format which is more easily viewed in Emacs org-mode. Mostly, it’s about folding: the parser’s output is organized to reflect its attempts to parse various expressions. Most of those parse attempts are “desired failures” and are not of interest; org-mode lets you hide them by pressing TAB.

Making sense of the parser debug output requires some familiarity with programming language theory, with compiler theory, and with the specifics of our L4 parser toolchain.

Development Conveniences

This little script (on Mac at least) watches your Downloads folder so every time you Save As CSV it moves the new download over to the test/ directory. Run from home directory. You may need to edit to taste.

fswatch -x ~/Downloads/ | perl -nle 'if (my ($fn, $target) = /(Legal(?:SSv| Spreadsheets - )(.*).csv) Created.*Renamed/) { for ($target) { $_ = lc $_; s/[^a-z]//g }; rename (qq(Downloads/$fn), qq(src/smucclaw/dsl/lib/haskell/natural4/test/$target-latest.csv)) && print(qq($fn -> $target)) && system(qq(cd src/smucclaw/dsl/lib/haskell/natural4; teedate=`date +%Y%m%d-%H%M%S`.txt; stack run -- --workdir=workdir test/$target-latest.csv > workdir/no-uuid/\$teedate; ln -sf \$teedate workdir/no-uuid/LATEST.txt; head \$teedate; echo done ))} BEGIN { $|++ }'

We should probably turn this into an actual shell script, not a command-line “one-liner”.

Parser Internals

The parser consumes CSV and produces a list of rules largely corresponding to the input.

Along the way, some “inline” rules are promoted to top-level.

Promotion of inline rules to top level

TYPICALLY

In some positions, a term can be assigned a default value using the TYPICALLY keyword.

Such uses result in a top-level rule: DefTypically.

MEANS

Suppose we have

EVERY Person
  WHO Qualifies
      MEANS walks
        AND    eats
            OR drinks
 MUST sing

(This is from the Waddington example.)

This parses to two top-level rules:

  1. a regulative rule EVERY person WHO Qualifies MUST sing.
  2. a constitutive rule Qualifies MEANS walks AND eats OR drinks.

Internal Mechanics

This promotion is accomplished thanks to the Monad Writer context of the Natural L4 Parser.

https://github.com/smucclaw/dsl/blob/main/lib/haskell/natural4/src/LS/Types.hs#L151-L152

Interpreter Internals

The Interpreter conducts multiple passes over the Parser’s AST.

The following passes are on the roadmap for the Interpreter.

Entity Model

Let’s declare as a Class anything that has attributes and instances.

How do we deal with subclasses, inheritance, and diamonds?

Subclasses are constructed using the EXTENDS keyword.

HAS-A relationships are dealt with using nesting of the HAS keyword.

Diamond inheritance is excluded.

Statics: rephrase as First-Order Logic

Construct trees of ANDs and ORs. How do we deal with a particular defined term appearing in multiple locations?

E.g. given a shopping cart with both alcohol and cigarettes, a LegalCheckout may refer to the “subroutine” for AgeOfMajority twice; do we need to “cache” to result of evaluation, or is that a problem for the runtimes? Perhaps we can assist by at least giving identifiers so the runtime can do the right thing.

The AnyAll library is responsible for most of this work.

The result of this transformation stage is visible in the qaHornsR and qaHornsT functions.

Macros

If the language has support for user-defined macros or other forms of syntactic sugar, evaluate those macros and desugar to canonical form.

Type annotations

Filter all instances of TypedMulti in our AST where the snd is Just; collect the explicit type annotations.

Type inference

Filter all instances of TypedMulti in our AST where the snd is Nothing; attempt type inference based on observations of how the fst element is used elsewhere. For top marks, Implement Hindley-Milner inference.

This work may be delegated to the Core L4.

Alternatively, we may be able to call GHC as a library.

Interpretation Requirements for Downstream Transpilation

After the parser succeeds, we have in our hands a list of Rules.

It is now the interpreter’s job to think about the Rules, get organized, and prepare for the next step – transpilation.

The transpilers will be wanting answers to questions like:

What is the ontology? What are the entities and relationships?

The DECLARE and DEFINE keywords together construct class and instance definitions.

Roadmap for the Interpreter: Some Open Questions

Entity Relations

We need to find a way to relate different entities with one another – an Employee may have an Employer. In a database, they may be connected explicitly by id fields or they may be connected implicitly.

In our system, we allow DECLARE / HAS to nest, so that multiple levels of nesting can be created. This is analogous to records containing other records, and to HAS-A relationships between classes.

Which types were explicit and which were inferred?

The TypedMulti alias looks like this:

type KVsPair = (NonEmpty Text.Text, Maybe TypeSig)    --- so really there are multiple Values
type TypedMulti = KVsPair                             --- | apple | orange | banana | :: | Fruit   |

The Maybe TypeSig part can be used for both explicitly annotated types and inferred types.

What warnings and errors do we produce?

In particular, the interpreter might be able to perform tree-shaking, dead code identification, and identify null pointer references.

It might also be able to identify race conditions. Maybe one transpiler can request to view the output of another transpiler backend, so the formal verification module ends up running first and then the other transpilers read output from that.

Can we get a representation of the original program text?

The Parser produces decent source locations, but the transformations performed by the Interpreter may muddy the trail.

If not, then for each node in the AST, can we at least get a link back to a source reference in the original program text?

And even beyond – because in the left column of L4 we have the (Act 1.2) cells which provide links even farther back to the source legislation.

What are the “statics” – the inference rules?

The interpreter needs to get its head around how all the BoolStructs go together.

Our NLG component, for instance, needs to convert text like X gives birth to live young. to Does X give birth to live young?

So it will want to know every single “ground term” which we need to ask the user about.

The Shannon/Allen visualizer and the JSON outputters will want to know how those terms fit together: the BoolStructs with their Labels.

The web app interface builder may expect the interpreter to do some kind of reduction/optimization of the questions – see ROBDD.

What are the “dynamics” – the state transitions?

This is of particular interest to the visualizer that produces a Petri net.

The interpreter needs to get its head around how all the HENCE and LEST blocks connect.

Are there any defined terms that weren’t actually defined?

We are literally looking at capitalization to determine what’s a Defined Term.

Contract with Transpilers

Each transpiler under XPile can expect an Interpretation containing the interpreter’s analysis of the rules.

That analysis consists of

classtable
of type ClsTab, a table of all the top-level classes declared
scopetable
of type ScopeTabs, a table of all the rule scopes defined

That Interpretation also contains

origrules
a list of rules ([Rule]) as returned by the Parser

ClsTab

See Haddock reference.

Classes contain ontological semantics according to the conventional OOP paradigm, excluding diamond inheritance.

Classes can contain other classes, in a HAS-A relationship, using nested HAS keywords.

Classes can extend other classes, in an IS-A relationship, using the EXTENDS keyword.

ScopeTabs

Scopes in L4 are intended to constrain certain definitions, reflecting source text such as

For the purposes of Sections 1 and 2, “work-week” is defined to include half a day on Saturdays, in addition to Mondays through Fridays, but excludes Sundays.

For the purposes of Sections 3 and 4, “work-week” is defined to exclude Saturdays and Sundays.

Scopes allow us to represent these kinds of statements.

See Haddock reference.

Language Reference

See the LegalSS L4 Manual.