Skip to content

Latest commit

 

History

History
572 lines (521 loc) · 23.2 KB

AangHOAS.org

File metadata and controls

572 lines (521 loc) · 23.2 KB

Design and Implementation of the Aang Programming Language

Getting Started

Add Code example

Licenses

Add license for the code and prose

Motivation / Goals

Arithemtic types

Writing in Denotative Relational Literate Programming

Denotational Design

Denotational design is a methodology developed by Conal Elliott, to improve the state of software development, currently most software is developed without any design and a dose of (let’s just do it), this leaves a bunch of problems like technical debt, leaky abstraction and much more, denotational design at it’s essence is to make a mathematical model of your types and operations by simple math that has been discovered for other reasons, for example we can denotate a stack data structure to be a function that takes in a natural number and returns an a, so Nat -> a, then we will denotate the functions that use the Stack like empty for an empty stack.

  • I recommend listening to the podcasts (link podcasts)
  • (Link advent of Haskell article)

Literate Programming

Literate programming is a paradigm where you mix code and prose (concerning the program at hand) at the same location, this has many benefits for example, readability, it is much easier for the reader to understand the underlying code compared to just inline comments or no comments at all, I think this is an extremely relatable factor for a lot us programmers, as Harold Abelson said:

Programs must be written for people to read, and only incidentally for machines to execute.

I am personally interested in exploring combining denotational design and literate programming together, where you specify your design with denotational design and only then, think about the implementation.

Functional Relational Programming

Okay, so we have denotational design and we use literate programming to put it at the forefront, we need a structure or better yet an architecture, for the implementation of said program, this is where functional relational programming comes at the foreground, it is an architecture designed to reduce complexity via using functional and relational programming, it turns out that it actually works nicely with literate programming and by extension denotational design. Okay we have all the individual pieces for it to be: denotational relational literate programming.

Denotative embedding

I went on a quest to find a way to to make a programming language using denotational semantics, after some time, I discovered tagless-final.

Tagless-final

Tagless-final is a technique of embedding domain specific languages (DSLs) in to the host langauge (Haskell in my case), instead of using data constructors as a representation of DSL terms, you typically use functions inside of a typeclass (or ML modules), this allows excellent extensibility, it is a fine solution to the expression problem, the main attraction for me is that from the author’s words:

Doing a tagless-final embedding is literally writing a denotational semantics for the DSL – in a host programming language rather than on paper.

That quote got me excited and after understanding the technique, I can say it does a really good job at it. There is one huge pitfall though for creating DSLs and that’s that, A tagless-final embedding is a shallow embedding, meaning we have no AST at our disposal, for applications like parsing or optimization.

The path for a solution

I have looked into many papers to solve this problem like Deep embedding with class and Combining Shallow and Deep EDSLs, the former uses an AST data type for a representation and includes a backdoor via existenial type for extension, the issue comes from the fact that it loses the denotational semantic properties that makes tagless-final so attractive, while the latter starts with a deep embedding with a shallow wrapper on top, which automatically makes the technique non-applicable for my needs, a bad solution for this problem is to lift term level functions to the type level and embed those functions to constructors, while this works, it is a far cry from a realistic and satisfactory programming experience since in Haskell, type level programming is inadequate, with no type level lambdas or first class type level functions (called families).

Design

Functionality

Aang is a simply typed lambda calculus with constructors and arithemtic types, arithemtic types meaning sum, product, negative and fractional types, The following defines the following the types and expressions. Here are the inference rules:

For types:

  • Base Type
  • Function Type

For Expressions:

Variables

Variables are de bruijn indice.

\[
\inference {A \quad B}{C}[d]
\]
  • Abstractions
  • Applications
  • Constructors
  • Pair
  • Fst
  • Snd
  • Unit
  • Either

Specification

AST

Types

Expressions

Variables

Similar as in the types heading we can also use Haskell’s variable semantics therefore we don’t have to specify variables in our language. Variables are de bruijn indices, we can specify them as:

variable0 :: wrap (h, a) a

And since we use de bruijn we have to specify the inductive case as:

variableSucc :: wrap h a -> wrap (h, any) a
Abstractions

Abstractions correspond to function definitions, which we can think of as a function that takes an indentifier and an expression:

abstraction :: wrap (env, a) b -> wrap env (a -> b)
Application

Application is just function application and can be specified as:

application :: wrap env (a -> b) -> wrap env a -> wrap env b
Constructors

Constructors are the builders of types, therefore their specification must not consume anything:

const :: wrap (a, h) a
Pair

Pair is the combination of two types, in Haskell it is the (,) type.

pair :: wrap env a -> wrap env b -> wrap env (a, b)
Fst

fst is grabbing the first value of the pair type.

fst :: wrap env (a, b) -> wrap env a
Snd

Snd is identical to Fst except it grabs the second value.

snd :: wrap env (a, b) -> wrap env b
Unit

Unit is the terminal object of the CCC, we can describe the function as:

unit :: wrap env ()

Parser

Instead of making a lexer + parser, parser combinators provides us a different approach where we merge both steps into only one. We will use the MegaParsec library for a parser, in this section we will specify each parser and then combine them for a complete parser.

Parser Type

The main type in MegaParsec is the ParsecT e s m a, the first e is the error type, s stream type, m a is the underlying monad. We will abstract the underlying monad via the type synonym.

type Parsec e s = ParsecT e s Identity

And finally another abstraction over the e error and the stream type being String.

type Parser = Parsec Void String

Parser combinators

Specification of all the parser combinators are:

parseVariables0 :: Lambda wrap h a => Parsec wrap h a
parseVariableSucc :: Lambda wrap h a => Parsec wrap h a
parseApplication :: Lambda wrap h a => Parsec wrap h a
parseAbstraction :: Lambda wrap h a => Parsec wrap h a
parsePair :: Lambda wrap h a => Parsec wrap h a
parseFst :: Lambda wrap h a => Parsec wrap h a
parseSnd :: Lambda wrap h a => Parsec wrap h a
parseUnit  :: Lambda wrap h a => Parsec wrap h a

We can combine these combinators into one parser:

parseAang :: Lambda wrap h a => Parsec wrap h a

The full picture

The full API:

wrap env a
variable0 :: wrap (h, a) a
variableSucc :: wrap h a -> wrap (h, any) a
application :: wrap h (a -> b) -> wrap h a -> wrap h b
abstraction :: wrap (env, a) b -> wrap env (a -> b)
pair :: wrap env a -> wrap env b -> wrap env (a, b)
fst :: wrap env (a, b) -> wrap env a
snd :: wrap env (a, b) -> wrap env b
unit :: wrap env ()

-- Parsing
Lambda wrap h a => Parsec wrap h a
parseVariables0 :: Lambda wrap h a => Parsec wrap h a
parseVariableSucc :: Lambda wrap h a => Parsec wrap h a
parseApplication :: Lambda wrap h a => Parsec wrap h a
parseAbstraction :: Lambda wrap h a => Parsec wrap h a
parsePair :: Lambda wrap h a => Parsec wrap h a
parseFst :: Lambda wrap h a => Parsec wrap h a
parseSnd :: Lambda wrap h a => Parsec wrap h a
parseUnit :: Lambda wrap h a => Parsec wrap h a
parseAang :: Lambda wrap h a => Parsec wrap h a  

Denotation

We are going to make semantic functions that map the lambda calculus world to a closed cartesian category, first we have to define the semantic domain though. Regardless, we will only make a denotation for the typeclass (let’s call it Lambda), Abstractions, Applications and Constructors since we are only introducing those primitives while everything else is for free for us by HOAS.

AST

Types

The Semantic Domain: The Function Category

You can pick any closed cartesian category really but the function category is simple, an instance comes with the library itself and the lambda calculus is generally a theory of computation of functions so it works out. So let’s model it.

_ :: Lambda (wrap a b) => wrap a b -> (a -> b)

Expressions

Variables
exr

In compiling to categories exr, correspond to the snd function in a pair, it’s in the cartesian part of the CCC, it’s defined as:

exr (a, b) = b
exl

exl corresponds to fst and it’s defined as:

exl (a, b) = a
Typing context

Typing context is a tuple that contains the term and it’s type, it looks like this: .

(.)

Simple composition. The composition primitive is necessary for a category to be a category so we can use this primitive.

Back to variables

Generally variabels correspond to identity, id but since we have the typing environment, it infact corresponds to exr.

variables0 = exr

We also have to inductive case to worry about, which can be defined beautifully as:

variablesSucc e1 = e1 . exl 
Abstractions
Curry

Curry is a higher-order function that takes in a function: (a, b) -> c and curries it to be: a -> b -> c. It’s notion in the CCC is the closed part focusing on the expontential type (the function type).

Back to the abstraction function

Abstraction in the tagless-final paper is just curry but it’s type arguments a and b are flipped, I prefer to use the curry semantics, rather than add new functions, so we must consider that the typing context is unsual where the type identifer is the first and the added argument is the second. exl extracts the first element of the tuple and we use the second argument of abstraction to apply the function therefore we gat a function a -> b.

abstraction e1 = curry e1
Applications
△ operator

The operator takes in two terms and constructs a function that is a tuple of those functions, we can specify it as:

f ~△ g = \x -> (f x, g x)

It’s notion is in cartesian part of CCC and it’s the introduction form, cartesian adds products to the category.

apply

apply is a function that takes in a tuple and apply’s the first term to the second. apply is in the closed part of CCC.

Back to application

We have what we need to make denotation.

application a b = apply . a  b
Pair

The corresponds perfectly as the introduction form to the pair.

pair e1 e2 = e1  e2

All the functions concerning products is the cartesian part of the CCC, which has introduction and projections.

Fst

Fst is exl.

fst e1 = exl e1  
Snd

Snd is exr

snd e1 = exr e1  
Unit

The unit function corresponds to the it function earlier.

unit e1 = it e1

Parsing

Parsing types

Parser type

While the parser type is slighty complicated being Parsec (wrap h a), we can think of it as String -> (wrap h a), String is [Char], we can also say the denotation of Char is a disjoint union, (a sum type) and a denotation of a list is a sequence and since we have the denotation of wrap h a we can therefore write the semantic function as: While the notation of sequence is (), since parantheses are used a lot in Haskell we will instead use: ⟨⟩.

_ :: Lambda (wrap h a) => Parsec (wrap h a) -> (Char -> (h -> a))

Other than this type, there will be no more denotations since that goes into the implementation details of the library (MegaParsec) which currently has no denotional semantics described for us to use.

The full picture

This shows the complete denotation, I think it shows the beauty and elegance of denotational design, combined with literate programming.

_ :: Lambda (wrap a b) => wrap a b -> (a -> b)
variables0 = exr
abstraction e1 = curry e1
variablesSucc e1 = e1 . exl   
application a b = apply . a  b
pair e1 e2 = e1  e2
fst e1 = exl e1
snd e1 = exr e1

-- Parsing
_ :: Lambda (wrap h a) => Parsec (wrap h a) -> (Char -> (h -> a))

Implementation

Infrastructure

Add libraries

Run bash to install MegaParsec and HList.

cabal install --lib megaparsec
cabal install --lib HList

Language extensions

Am using advanced GHC extensions to mostly compute at the type level per the requirements of Denotative embedding.

:set -XPartialTypeSignatures
:set -XScopedTypeVariables
:set -XOverloadedStrings
:set -XQuasiQuotes
:set -XTemplateHaskell
:set -XTypeFamilies
:set -XPartialTypeSignatures
:set -XUndecidableInstances
:set -XPolyKinds
:set -XStandaloneDeriving
:set -XGADTs
:set -XAllowAmbiguousTypes
:set -XDataKinds

Imports

Load imports

GHCi requires us to load imports before using them.

:set -package base
:set -package megaparsec
:set -package HList

Importing a parser library and type level programming libraries.

import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Void
import Data.HList.HList  

Multi-line

This options allows literate programming with Haskell to be much better where it allows to make multi-line functions, (org-babel connects to ghci).

:set +m

Compilation target

Using Emacs and org-mode major mode, use this code block to compile Aang, it references the web of codeblocks for ease of use. C-c this code block to compile the full program.

Compile it

Essential State

Types

Function type

The main type that we are going to use is the function type (->), it comes built in with Haskell.

HList

We can use tuples to represent the environment but that is a messy solution instead we will use an HList instead. We will use the HList provided from the HList library.

Relations

In the out of the tar pit paper, the authors suggest only using relations and more generally the relational algebra for the state part of a program, we adhere to the paper by using record types analogously as relations. As I said before, record types in Haskell can be analogous to relations (tables in SQL), infact, this approach is used in Persistent which is the most popular ORM in Haskell and the native Haskell database Project-M36 (check this project out, it’s really underrated). The main relation is the R relation which has one pair, unR is the attribute’s name and it’s type is the function type. Let’s define it:

data Reader h a = MkReader {unReader :: HList h -> a}

The R relation is actually isomorphic to the function type since they are representially the same.

Essential Logic

AST

Classes

Expr is the AST of the Aang language, as I have have said before, Aang uses tagless-final so functions instead of an ADT to descibe an AST.

class CoCartesian wrap where
  inl :: wrap h a -> wrap h (Either a b)
  inr :: wrap h b -> wrap h (Either a b)
  prl :: wrap h (Either a b) -> wrap a a
  prr :: wrap h (Either a b) -> wrap a a
--   
data LUnit = LUnit  
class AST wrap where
  variable0 :: wrap (a ': xs) a
  variableSucc :: wrap as a -> wrap (b ': as) a
  abstraction :: wrap (a ': xs) b -> wrap xs (a -> b)
  application :: wrap env (a -> b) -> wrap env a -> wrap env b
  unit :: wrap env LUnit
  pair :: wrap env a -> wrap env b -> wrap env (a, b)
  fst' :: wrap env (a, b) -> wrap env a
  inl :: repr env a -> repr env (Either a b)
  inr :: repr env b -> repr env (Either a b)
  snd' :: wrap env (a, b) -> wrap env b 
  void :: repr env Void 
  -- unit :: wrap h ()
  -- intoLeft :: wrap h a -> wrap h (Either a b)
  -- intoRight :: wrap h b -> wrap h (Either a b)
  -- prjLeft :: wrap h (Either a b) -> wrap h a
  -- prjRight :: wrap h (Either a b) -> wrap h b

-- Necessary comment for the where clause to be closed, ob-haskell should be improved :)
Closed
class Closed where
  apply :: (a -> b, a) `k` b
Cartesian
class Closed where
  triangle :: (a `k` c) -> (a `k` d) -> (a `k` (c, d))

Instance

Helper functions

Helper functions that will make the code cleaner.

apply (f, x) = f x
triangle f g = \x -> (f x, g x)
fe1 e1 = \x -> (unReader e1 x)
fe2 e2 = \x -> (unReader e2 x)
it' h = ()
  
Back to reader instance

Since we are doing the AST, the tagless-final way, the design and implementation are actually not that different which I find to be so beautiful.

:{
instance AST Reader where
  variable0 = MkReader $ \(HCons x _) -> x
  variableSucc (MkReader f) = MkReader $ \(HCons _ env) -> f env
  abstraction e1 = MkReader $ (\f x -> unReader e1 (HCons x f))
  application e1 e2 = MkReader $ (\f -> (unReader e1 f) (unReader e2 f))
  unit = MkReader $ \env -> LUnit
  pair e1 e2 = MkReader $ (\env -> (unReader e1 env, unReader e2 env))
  fst' e1 = MkReader $ (\env -> fst $ (unReader e1 env))
  snd' e1 = MkReader $ \env -> snd $ (unReader e1 env)
  inl f = MkReader $ (\env -> Left (unReader f env))
  inr f = MkReader $ (\env -> Right (unReader f env))
  void = undefined
:}      
  -- pair e1 e2 = MkReader $ triangle (fe1 e1) (fe2 e2)
 
  -- unit = MkReader $ it'

-- T

Parser

To be able to parse, we need to have a an AST representation which tagless-final does not have, (remember we are using functions instead of constructors). Am going to use a GADT as an initial encoding, after that use Dynamic to fix type differences and only then can we move to functions and tagless-final.

Monomorphism requirement

Types that go into the dynamic type must be monomorphic, therefore we restrict polymorphism aspect of our GADT with:

data Repr = Repr deriving (Typeable, Show)
data Repr2 = Repr2 deriving (Typeable, Show)
data Env = Env deriving (Typeable, Show)
data Any = Any deriving (Typeable, Show)

Tree representation

Initial is the GADT that I was talking about, it is nearly equivalent to the AST typeclass except we have constructors instead of functions.

:{
data Initial i h a where
  Eval :: i h a -> Initial i h a  
  Variable :: Initial i (h, a) a
  VariableSucc :: Initial i h a -> Initial i (h, any) a
  Abstraction :: Initial i (h, a) b -> Initial i h (a -> b)
  Application :: Initial i h (a -> b) -> Initial i h a -> Initial i h b
  Pair :: Initial i h a -> Initial i h b -> Initial i h (a, b)
  Fst :: Initial i h (a, b) -> Initial i h a
  Snd :: Initial i h (a, b) -> Initial i h b
  Unit :: Initial i h ()
:}

--

Hiding the type paremeters

We need to hide the type parameters to be able to easily combine parsers, we can do that via an existential type.

data Exists where
  Symantic :: forall h a. Initial h a -> Exists
--

We also need an eliminator to get that type back.

:{
cps :: Exists -> (forall h a r. Initial h a -> r) -> r
cps (Symantic t) k = k t
:}  

Back to parser

Simple parser combinations, we are not using the R type but instead we are overloading the operations

type Parser = Parsec Void String

:{
parseVar :: Parser Exists
parseVar = string "Var" *> space1 *> pure (Symantic Variable)
-- parseVariableSucc :: Parser Exists
-- parseVariableSucc = string "Succ" *> space1 *> pure (toDyn Abstraction)
-- parseAbstraction :: Parser Exists
-- parseAbstraction = string "Abstraction" *> space1 *> pure (toDyn Abstraction)
-- parseApplication :: Parser Exists 
-- parseApplication = string "Application" *> space1 *> pure (toDyn Application) 
-- parsePair :: Parser Exists
-- parsePair = string "Pair" *> space1 *> pure (toDyn Pair)   
-- parseFst :: Parser Exists
-- parseFst = string "Fst" *> space1 *> pure (toDyn Fst)   
-- parseSnd :: Parser Exists
-- parseSnd = string "Snd" *> space1 *> pure (toDyn Snd)     
-- parseUnit :: Parser Exists
-- parseUnit = string "Unit" *> space1 *> pure (toDyn Unit)
-- parseAang :: Parser Exists
-- parseAang = parseVar <|> parseVariableSucc <|> parseAbstraction <|> parseApplication <|> parsePair <|> parseFst <|> parseSnd <|> parseUnit
:} 

To Inital representation

After we have parsed, we need to get back to the intial representation so that we can aftewards translate to the corresponding typeclass.

Translation to tagless-final (Translator)

Now that we have an AST, we can then use tagless-final in all of it’s glory, we just have to translate the initial to the final first.

:{
toFinal :: AST wrap => forall h a. Initial wrap h a -> wrap h a
toFinal (Eval i) = i
toFinal (Variable) = variable0
toFinal (VariableSucc a) = variableSucc (toFinal a)
toFinal (Application a b) = application (toFinal a) (toFinal b)
toFinal (Abstraction a) = abstraction (toFinal a)
toFinal (Pair a b) = pair (toFinal a) (toFinal b)
toFinal (Fst a) = fst' (toFinal a)
toFinal (Snd b) = snd' (toFinal b)
toFinal (Unit) = unit
  :}

Evaluator

Evaluation is simple with just the function:

eval e = unReader e ()
:{
data IR h a where
  Var :: h a -> IR h a
data R a = R {unR :: a}  
evalI :: IR R t -> t
evalI (Var t) = unR t
:}  

Accidental State And Control

Other (Interfacing)

Footnotes

[fn:1]: