Pesto specification draft

Pesto is a text-based human-editable and machine-transformable cooking recipe interchange format.

warning

This specification is work-in-progress and thus neither stable, consistent or complete.

1 About this document

This section contains various information about this document. The second section motivates why inventing another file format is necessary, followed by the goals of Pesto. After a short Pesto primer intended for the casual user the language’s syntax and semantics are presented. The linting section limits the language to useful cooking recipes. Examples for user presentation of recipes and serialization follow.

Being a literate program this document is specification and reference implementation at the same time. The code is written in Haskell and uses the parsec parser combinator library, as well as HUnit for unit tests. Even without knowing Haskell’s syntax you should be able to understand this specification. There’s a description above every code snippet explaining what is going on.

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

Version

1-draft

License

CC0

Website

https://6xq.net/pesto/

Discussion

https://github.com/PromyLOPh/pesto

Contributors

2 Motivation

The landscape of recipe interchange formats is quite fragmented. First of all there’s HTML microdata. Google rich snippets, which are equivalent to the schema.org microdata vocabulary, are widely used by commercial recipe sites. Although the main objective of microdata is to make content machine-readable most sites will probably use it, because it is considered a search-engine optimization (SEO). Additionally parsing HTML pulled from the web is a nightmare and thus not a real option for sharing recipes. h-recipe provides a second vocabulary that has not been adopted widely yet.

Most cooking-related software comes with their own recipe file format. Meal-Master is one example and, due to its age, a large amount of recipe files is available in this format. There does not seem to be any official documentation for the format, but inofficial ABNF grammar and format description exist. A Meal-Master recipe template might look like this:

---------- Recipe via Meal-Master (tm)

      Title: <Title>
 Categories: <Categories>
      Yield: <N servings>

    <N> <unit> <ingredient>
    …

-------------------------------<Section name>-----------------------------
  <More ingredients>

  <Instructions>

-----

Rezkonv aims to improve the Mealmaster format by lifting some of its character limits, adding new syntax and translating it to german. However the specification is available on request only. Early versions of MasterCook used the MXP file format, but newer software versions use, beside a bunch of other file formats, the XML-based MX2.

In general XML seems to be the preferred markup language for a bunch of interchange formats created around 2005.

RecipeML

Formerly known as DESSERT and released in 2002 (version 0.5). The license requires attribution and – at the same time – forbids using the name RecipeML for promotion without written permission.

eatdrinkfeelgood

Version 1.1 was released in 2002 as well, but the site is not online anymore. The DTD is licensed under the CC by-sa license.

REML

Released in 2005 (version 0.5), aims to improve support for commercial uses (restaurant menus and cookbooks). The XSD’s license permits free use and redistribution, but the reference implementation has no licensing information.

RecipeBook XML

Released 2005 as well and shared unter the terms of CC by-sa is not available on the web any more.

CookML

Last updated in 2006 (version 1.0.4) for the german-language shareware program Kalorio has a custom and restrictive licence that requires attribution and forbids derivate works.

And then there’s XML-based export formats for most currently maintained recipe managing software.

Living Cookbook

Uses a XML-based format called fdx version 1.1. There’s no specification to be found, but a few examples_ are available and those are dated 2006.

My CookBook

Uses the file extension .mcb. A specification is available.

KRecipes

Uses its own export format. However there is no documentation whatsoever.

Gourmet

The program’s export format suffers from the same problem. The only document available is the DTD.

YumML is an approach similar to those listed above, but based on YAML instead of XML. The specification has been removed from the web and is available through the Web Archive only.

Cordon Bleu (1999) encodes recipes as programs for a cooking machine and defines a Pascal-like language. Being so close to real programming languages Cordon Bleu is barely useable by anyone except programmers. Additionally the language is poorly-designed, since its syntax is inconsistent and the user is limited to a set of predefined functions.

Finally there is RxOL, created in 1985. It constructs a graph from recipes written down with a few operators and postfix notation. It does not separate ingredients and cooking instructions like every other syntax introduced before. Although Pesto is not a direct descendant of RxOL both share many ideas.

microformats.org has a similar list of recipe interchange formats.

3 Goals

First of all recipes are written by humans for humans. Thus a human-readable recipe interchange format is not enough. The recipes need to be human-editable without guidance like a GUI or assistant. That’s why, for instance, XML is not suitable and the interchange formats listed above have largely failed to gain traction. XML, even though simple itself, is still too complicated for the ordinary user. Instead a format needs to be as simple as possible, with as little markup as possible. A human editor must be able to remember the entire syntax. This works best if the file contents “make sense”. A good example for this is Markdown.

We also have to acknowledge that machines play an important role in our daily life. They can help us, the users, accomplish our goals if they are able to understand the recipes as well. Thus they too need to be able to read and write recipes. Again, designing a machine-readable format is not enough. Recipes must be machine-transformable. A computer program should be able to create a new recipe from two existing ones, look up the ingredients and tell us how many joules one piece of that cake will have. And so on.

That being said, Pesto does not aim to carry additional information about ingredients or recipes itself. Nutrition data for each ingredient should be maintained in a separate database. Due to its minimal syntax Pesto is also not suitable for extensive guides on cooking or the usual chitchat found in cooking books.

4 Introduction by example

So let’s start by introducing Pesto in a XXXnon-formal, XXX way: By example. We are now going to cook XXX. The following recipe contains all the information you need to do that.

do it.

See https://github.com/PromyLOPh/rezepte for example recipes.

5 Language syntax

Pesto parses UTF-8 encoded input data consisting of space-delimited instructions. Every character within the Unicode whitespace class is considered a space.

stream = ((,) <$> getPosition <*> instruction) `sepEndBy` spaces1
	<?> "stream"
spaces1 = many1 space

The following instructions are supported:

data Instruction =
	  Annotation String
	| Ingredient Quantity
	| Tool Quantity
	| Action String
	| Reference Quantity
	| Result Quantity
	| Alternative Quantity
	| Directive String
	| Unknown String
	deriving (Show, Eq)

instruction =
	    try annotation
	<|> try ingredient
	<|> try tool
	<|> try action
	<|> try result
	<|> try alternative
	<|> try reference
	<|> try directive
	<|> try unknown
	<?> "instruction"

The pesto grammar has two instruction types: The first one begins with a start symbol (start) and consumes any character up to and including a terminating symbol (end), which can be escaped with a backslash (\).

betweenEscaped start end =
	   char start
	*> many (try (char '\\' *> char end) <|> satisfy (/= end))
	<* char end

Annotations and actions both are of this kind:

annotation = Annotation <$> betweenEscaped '(' ')'
action = Action <$> betweenEscaped '[' ']'

Here are examples for both:

testOpterm = [cmpInstruction "(skinless\nboneless)" (Right (Annotation "skinless\nboneless"))
	, cmpInstruction "[stir together]" (Right (Action "stir together"))
	, cmpInstruction "[stir\\]together]" (Right (Action "stir]together"))]

The second one starts with one identifying character, ignores the following whitespace characters and then consumes an object or a quantity.

oparg ident cont = char ident *> spaces *> cont
ingredient = oparg '+' (Ingredient <$> quantity)
tool = oparg '&' (Tool <$> quantity)
result = oparg '>' (Result <$> quantity)
alternative = oparg '|' (Alternative <$> quantity)
reference = oparg '*' (Reference <$> quantity)

Additionally there are two special instructions. Directives are similar to the previous instructions, but consume a qstr.

directive = oparg '%' (Directive <$> qstr)

Unknown instructions are the fallthrough-case and accept anything. They must not be discarded at this point. The point of accepting anything is to fail as late as possible while processing input. This gives the parser a chance to print helpful mesages that provide additional aid to the user who can then fix the problem.

unknown = Unknown <$> many1 notspace

Below are examples for these instructions:

testOparg = [
	  cmpInstruction "+100 g flour"
	      (Right (Ingredient (Quantity (Exact (AmountRatio (100%1))) "g" "flour")))
	, cmpInstruction "&oven"
	      (Right (Tool (strQuantity "oven")))
	, cmpInstruction ">dough" (Right (Result (strQuantity "dough")))
	, cmpInstruction "|trimmings" (Right (Alternative (strQuantity "trimmings")))
	, cmpInstruction "*fish"
	      (Right (Reference (strQuantity "fish")))
	, cmpInstruction3 "% invalid" (Right (Directive "invalid")) "%invalid"
	, cmpInstruction3 "* \t\n 1 _ cheese"
	      (Right (Reference (Quantity (Exact (AmountRatio (1%1))) "" "cheese")))
	      "*1 _ cheese"
	]

5.1 Qstr

Before introducing quantities we need to have a look at qstr, which is used by them. A qstr, short for quoted string, can be – you guessed it already – a string enclosed in double quotes, a single word or the underscore character that represents the empty string.

qstr = try (betweenEscaped '"' '"')
	<|> word
	<|> char '_' *> return ""

A word always starts with a letter, followed by any number of non-space characters.

word = (:) <$> letter <*> many notspace
notspace = satisfy (not . isSpace)

The empty string can be represented by two double quotes or the underscore, but not the empty string itself.

testQstr = [
	  cmpQstr3 "\"\"" (Right "") "_"
	, cmpQstr "_" (Right "")
	, cmpQstr "" parseError

Any Unicode character with a General_Category major class L (i.e. a letter, see Unicode standard section 4.5 for example) is accected as first character of a word. That includes german umlauts as well as greek or arabic script. Numbers, separators, punctuation and others are not permitted.

	, cmpQstr "water" (Right "water")
	, cmpQstr "Äpfel" (Right "Äpfel")
	, cmpQstr "τυρί" (Right "τυρί")
	, cmpQstr "جبن" (Right "جبن")
	, cmpQstr "1sugar" parseError
	, cmpQstr "+milk" parseError
	, cmpQstr "∀onion" parseError

The remaining letters of a word can be any character, including symbols, numbers, …

	, cmpQstr "rump-roast" (Right "rump-roast")
	, cmpQstr "v1negar" (Right "v1negar")
	, cmpQstr "mush\"rooms" (Right "mush\"rooms")

…but not spaces.

	, cmpQstr " tomatoes" parseError
	, cmpQstr "tomatoes " parseError
	, cmpQstr "lemon juice" parseError
	, cmpQstr "sour\tcream" parseError
	, cmpQstr "white\nwine" parseError

If a string contains spaces or starts with a special character it must be enclosed in double quotes.

	, cmpQstr3 "\"salt\"" (Right "salt") "salt"
	, cmpQstr "\"+milk\"" (Right "+milk")
	, cmpQstr "\"soy sauce\"" (Right "soy sauce")
	, cmpQstr "\"1sugar\"" (Right "1sugar")
	, cmpQstr "\"chicken\tbreast\nmeat\"" (Right "chicken\tbreast\nmeat")

Double quotes within a string can be quoted by prepending a backslash. However the usual escape codes like \n, \t, … will not be expanded.

 	, cmpQstr "\"vine\"gar\"" parseError
 	, cmpQstr3 "\"vine\\\"gar\"" (Right "vine\"gar") "vine\"gar"
 	, cmpQstr "\"oli\\ve oil\"" (Right "oli\\ve oil")
 	, cmpQstr "\"oli\\\\\"ve oil\"" (Right "oli\\\"ve oil")
 	, cmpQstr3 "\"sal\\tmon\"" (Right "sal\\tmon") "sal\\tmon"
	]

5.2 Quantity

The instructions Ingredient, Tool and Reference accept a quantity, that is a triple of Approximately, Unit and Object as parameter.

data Quantity = Quantity Approximately Unit Object deriving (Show, Eq)

The syntactic construct is overloaded and accepts one to three arguments. If just one is given it is assumed to be the Object and Approximately and Unit are empty. Two arguments set Approximately and Unit, which is convenient when the unit implies the object (minutes usually refer to the object time, for example).

quantity = try quantityA <|> quantityB
quantityA = Quantity
	<$> approximately
	<*  spaces1
	<*> unit
	<*> (try (spaces1 *> object) <|> return "")
quantityB = Quantity
	<$> return (Exact (AmountStr ""))
	<*> return ""
	<*> object
testQuantityOverloaded = [
	  cmpQuantity "oven" (exactQuantity (AmountStr "") "" "oven")
	, cmpQuantity "10 min" (exactQuantity (AmountRatio (10%1)) "min" "")
	, cmpQuantity "100 g flour" (exactQuantity (AmountRatio (100%1)) "g" "flour")

The first two are equivalent to

	, cmpQuantity3 "_ _ oven" (exactQuantity (AmountStr "") "" "oven") "oven"
	, cmpQuantity3 "10 min _" (exactQuantity (AmountRatio (10%1)) "min" "") "10 min"

Missing units must not be ommited. The version with underscore should be prefered.

	, cmpQuantity3 "1 \"\" meal" (exactQuantity (AmountRatio (1%1)) "" "meal") "1 _ meal"
	, cmpQuantity "1 _ meal" (exactQuantity (AmountRatio (1%1)) "" "meal")
	]

Units and objects are just strings. However units should be limited to well-known metric units and some guidelines apply to Objects as well.

type Unit = String
unit = qstr

type Object = String
object = qstr

Approximately is a wrapper for ranges, that is two amounts separated by a dash, approximate amounts, prepended with a tilde and exact amounts without modifier.

data Approximately =
	  Range Amount Amount
	| Approx Amount
	| Exact Amount
	deriving (Show, Eq)

approximately = try range <|> try approx <|> exact
range = Range <$> amount <*> (char '-' *> amount)
approx = Approx <$> (char '~' *> amount)
exact = Exact <$> amount
testQuantityApprox = [
	  cmpQuantity "1-2 _ bananas" (Right (Quantity (Range (AmountRatio (1%1)) (AmountRatio (2%1))) "" "bananas"))
	, cmpQuantity "1 - 2 _ bananas" parseError
	, cmpQuantity "1- 2 _ bananas" parseError
	, cmpQuantity "1 -2 _ bananas" parseError
	, cmpQuantity "~2 _ bananas" (Right (Quantity (Approx (AmountRatio (2%1))) "" "bananas"))
	, cmpQuantity "~ 2 _ bananas" parseError
	]

Amounts are limited to rational numbers and strings. There are no real numbers by design and implementations should avoid representing rational numbers as IEEE float. They are not required and introduce ugly corner cases when rounding while converting units for example.

data Amount =
	  AmountRatio Rational
	| AmountStr String
	deriving (Show, Eq)

amount = try ratio <|> AmountStr <$> qstr
testQuantityAmount = [
	  cmpQuantity "some _ pepper" (exactQuantity (AmountStr "some") "" "pepper")
	, cmpQuantity3 "\"some\"-\"a few\" _ bananas" (Right (Quantity (Range (AmountStr "some") (AmountStr "a few")) "" "bananas")) "some-\"a few\" _ bananas"
	, cmpQuantity "~\"the stars in your eyes\" _ bananas" (Right (Quantity (Approx (AmountStr "the stars in your eyes")) "" "bananas"))
	]

Rational numbers can be an integral, numerator and denominator, each separated by a forward slash, just the numerator and denominator, again separated by a forward slash or just a numerator with the default denominator 1 (i.e. ordinary integral number).

ratio = let toRatio i num denom = AmountRatio ((i*denom+num)%denom) in
	    try (toRatio <$> int <*> (char '/' *> int) <*> (char '/' *> int))
	<|> try (toRatio <$> return 0 <*> int <*> (char '/' *> int))
	<|> try (toRatio <$> return 0 <*> int <*> return 1)

These are all equal.

testQuantityRatio = [
	  cmpQuantity "3 _ bananas" (exactQuantity (AmountRatio (3%1)) "" "bananas")
	, cmpQuantity3 "3/1 _ bananas" (exactQuantity (AmountRatio (3%1)) "" "bananas") "3 _ bananas"
	, cmpQuantity3 "3/0/1 _ bananas" (exactQuantity (AmountRatio (3%1)) "" "bananas") "3 _ bananas"

XXtwo is num and denom

	, cmpQuantity "3/5 _ bananas" (exactQuantity (AmountRatio (3%5)) "" "bananas")

three is int, num and denom

	, cmpQuantity "3/5/7 _ bananas" (exactQuantity (AmountRatio ((3*7+5)%7)) "" "bananas")
	, cmpQuantity3 "10/3 _ bananas" (exactQuantity (AmountRatio (10%3)) "" "bananas") "3/1/3 _ bananas"

Can be used with ranges and approximate too. and mixed with strings

	, cmpQuantity "1-\"a few\" _ bananas" (Right (Quantity (Range (AmountRatio (1%1)) (AmountStr "a few")) "" "bananas"))
	, cmpQuantity "1/1/2-2 _ bananas" (Right (Quantity (Range (AmountRatio (3%2)) (AmountRatio (4%2))) "" "bananas"))
	, cmpQuantity "~1/1/2 _ bananas" (Right (Quantity (Approx (AmountRatio (3%2))) "" "bananas"))
	]

5.3 Appendix

int = read <$> many1 digit
parse = runParser stream () ""

Test helpers:

isLeft (Left _) = True
isLeft _ = False
parseError = Left (newErrorUnknown (newPos "" 0 0))
cmpParser f str (Left _) = TestCase $ assertBool str $ isLeft $ runParser (f <* eof) () "" str
cmpParser f str expected = str ~: runParser (f <* eof) () "" str ~?= expected
cmpParseSerialize f str expectp@(Left _) _ = [cmpParser f str expectp]
cmpParseSerialize f str expectp@(Right expectpval) expects = [
	  cmpParser f str expectp
	, serialize expectpval ~?= expects]

Wrap qstr test in AmountStr to aid serialization test

cmpQstr input expectp = cmpQstr3 input expectp input
cmpQstr3 input (Left expect) _ = [cmpParser (AmountStr <$> qstr) input (Left expect)]
cmpQstr3 input (Right expect) expects = cmpParseSerialize (AmountStr <$> qstr) input (Right (AmountStr expect)) expects
cmpQuantity a b = cmpQuantity3 a b a
cmpQuantity3 = cmpParseSerialize quantity
cmpInstruction a b = cmpInstruction3 a b a
cmpInstruction3 = cmpParseSerialize instruction
exactQuantity a b c = Right (Quantity (Exact a) b c)
strQuantity = Quantity (Exact (AmountStr "")) ""
 test = [
 	  "quantity" ~: testQuantityOverloaded ++ testQuantityApprox ++ testQuantityAmount ++ testQuantityRatio
	, "qstr" ~: testQstr
	, "oparg" ~: testOparg
 	, "opterm" ~: testOpterm
 	]
isResult (Result _) = True
isResult _ = False
isReference (Reference _) = True
isReference _ = False
isAlternative (Alternative _) = True
isAlternative _ = False
isAnnotation (Annotation _) = True
isAnnotation _ = False
isAction (Action _) = True
isAction _ = False
isDirective (Directive _) = True
isDirective _ = False
isUnknown (Unknown _) = True
isUnknown _ = False

6 Language semantics

The parser’s output, a stream of instructions, may contain multiple recipes. A recipe must start with the directive “pesto” and may end with “bonappetit”. This function extracts all recipes from the stream and removes both directives.

extract [] = []
extract (Directive "pesto":stream) = between:extract next
	where
		isEnd (Directive x) | x `elem` ["bonappetit", "pesto"] = True
		isEnd _ = False
		(between, next) = break isEnd stream
extract (_:xs) = extract xs

Start and end directive are removed from the extracted instructions. The directive “bonappetit” is optional at the end of a stream.

testExtract = [
	  extract [Directive "pesto", Directive "bonappetit"] ~?= [[]]
	, extract [Directive "pesto", Action "foobar", Directive "bonappetit"] ~?= [[Action "foobar"]]
	, extract [Directive "pesto"] ~?= [[]]
	, extract [Directive "pesto", Directive "foobar"] ~?= [[Directive "foobar"]]

Instructions surrounding the start and end directive are removed.

	, extract [Unknown "Something", Directive "pesto"] ~?= [[]]
	, extract [Unknown "Something", Action "pour", Directive "pesto"] ~?= [[]]
	, extract [Directive "pesto", Directive "bonappetit", Annotation "something"] ~?= [[]]

The stream may contain multiple recipes. The start directive also ends the previous recipe and starts a new one.

	, extract [Directive "pesto", Action "pour", Directive "bonappetit", Action "foobar", Directive "pesto", Annotation "something"] ~?= [[Action "pour"], [Annotation "something"]]
	, extract [Directive "pesto", Action "heat", Directive "pesto", Annotation "something"] ~?= [[Action "heat"], [Annotation "something"]]
	, extract [Directive "pesto", Annotation "foobar", Directive "pesto", Directive "bonappetit"] ~?= [[Annotation "foobar"], []]
	]

Each recipe’s stream of instructions drives a stack-based machine that transforms it into a directed graph. Think of the stack as your kitchen’s workspace that is used to prepare the food’s components. You can add new ingredients, perform actions on them, put them aside and add them again.

This function processes a list of nodes, that is instructions uniquely identified by an integer and returns the edges of the directed graph as a list of tuples.

toGraph nodes = third $ foldl f (Nothing, [[]], []) nodes
	where

Ingredients are simply added to the current workspace. They should for example appear on the shopping list.

		f ctx (i, Ingredient _) = addToStack ctx i

The same happens for for tools. However they are not part of the final product, but used in the process of making it. For instance they do not appear on the shopping list. Time is a tool.

		f ctx (i, Tool _) = addToStack ctx i

Actions take all ingredients and tools currently on the workspace, perform some action with them and put the product back onto the workspace.

		f (_, stack:sx, edges) (i, Action _) = (Just i, [i]:stack:sx, edgesTo i stack ++ edges)
		f (_, [], _) (_, Action _) = undefined -- never reached

Results add a label to the current workspace’s contents and move them out of the way. It should be a meaningful name, not just A and B obviously. Consecutive Results add different labels to the same workspace. That’s useful when an action yields multiple results at once that are processed in different ways.

		f ctx (i, Result _) = consumeStack ctx i

Alternatives too add a label to the current workspace’s content, but they pick one of things on the workspace and throw everything else away. This allows adding optional or equivalent ingredients to a recipe (i.e. margarine or butter).

		f ctx (i, Alternative _) = consumeStack ctx i

References are similar to ingredients. They are used to add items from a workspace labeled with Result or Alternative. More on that in the next section.

		f ctx (i, Reference _) = addToStack ctx i

Annotations add a description to any of the previous instructions. They can be used to provide more information about ingredients (so “hot water” becomes “+water (hot)”, tools (“&oven (200 °C)”) or actions (“[cook] (XXX)”).

		f ctx@(Nothing, _, _) (_, Annotation _) = ctx
		f (Just prev, s, edges) (i, Annotation _) = (Just prev, s, (i, prev):edges)

Unused directives or unknown instructions are danging nodes with no connection to other nodes.

		f ctx (_, Directive _) = ctx
		f ctx (_, Unknown _) = ctx

These are helper functions:

		addToStack (_, stack:sx, edges) i = (Just i, (i:stack):sx, edges)
		addToStack (_, [], _) _ = undefined -- never reached
		consumeStack (_, s, edges) i =
			let
				stack = dropWhile null s
				(top:sx) = if null stack then [[]] else stack
			in (Just i, []:top:sx, edgesTo i top ++ edges)
		edgesTo i = map (\x -> (x, i))

Here are a few example of how this stack-machine works. Each edge is a tuple of two integer numbers. These are the nodes it connects, starting with zero. Ingredient, Tool and Reference itself do not create any edges:

testGraph = [
	  cmpGraph "+ketchup &spoon *foobar" []

But Action, Alternative and Result do in combination with them:

	, cmpGraph "+foobar [barbaz]" [(0, 1)]
	, cmpGraph "+foobar |barbaz" [(0, 1)]
	, cmpGraph "+foobar >barbaz" [(0, 1)]
	, cmpGraph "+foobar +B >barbaz" [(0, 2), (1, 2)]
	, cmpGraph "+foobar >barbaz +foobar >barbaz" [(0, 1), (2, 3)]
	, cmpGraph "+foobar [barbaz] +foobar >barbaz" [(0, 1), (1, 3), (2, 3)]
	, cmpGraph "&foobar [barbaz] [C] >D" [(0, 1), (1, 2), (2, 3)]

If the stack is empty, i.e. it was cleared by a Result or Alternative instruction, consecutive results or alternatives operate on the previous, non-empty stack.

	, cmpGraph "+foobar >barbaz >C" [(0, 1), (0, 2)]
	, cmpGraph "+foobar |barbaz |C" [(0, 1), (0, 2)]
	, cmpGraph "+foobar >barbaz |C" [(0, 1), (0, 2)]

Unless that stack too is empty. Then they do nothing:

	, cmpGraph ">foobar >foobar" []
	, cmpGraph "|foobar |foobar" []
	, cmpGraph "(foobar) (foobar)" []
	, cmpGraph "[foobar]" []

The Annotation instruction always creates an edge to the most-recently processed node that was not an annotation. Thus two consecutive annotations create edges to the same node.

	, cmpGraph "+foobar (barbaz)" [(1, 0)]
	, cmpGraph "+foobar (barbaz) (C)" [(1, 0), (2, 0)]
	, cmpGraph "+foobar (barbaz) >barbaz" [(1, 0), (0, 2)]
	, cmpGraph "+foobar >barbaz (C)" [(0, 1), (2, 1)]
	, cmpGraph "+foobar |barbaz (C)" [(0, 1), (2, 1)]
	, cmpGraph "*foobar (C)" [(1, 0)]

Unknown directives or instructions are never connected to other nodes.

	, cmpGraph "%invalid" []
	, cmpGraph "invalid" []
	]

6.1 References

Results and alternatives can be referenced with the Reference instruction. Resolving these references does not happen while buiding the graph, but afterwards. This allows referencing an a result or alternative before its definition with regard to the their processing order.

Resolving references is fairly simple: For every reference its object name a case-insensitive looked is performed in a table containing all results and alternatives. If it succeeds an edge from every result and alternative returned to the reference in question is created.

 resolveReferences nodes = foldl f [] nodes
 	where
		f edges (i, ref@(Reference _)) = map (\x -> (x, i)) (findTarget nodes ref) ++ edges
 		f edges _ = edges
 findTarget nodes (Reference (Quantity _ _ a)) = map fst $ filter (isTarget a) nodes
 	where
 		lc = map toLower
 		isTarget dest (_, Result (Quantity _ _ x)) = lc x == lc dest
		isTarget dest (_, Alternative (Quantity _ _ x)) = lc x == lc dest
 		isTarget _ _ = False
 findTarget _ _ = []

References works before or after the result instruction.

testRef = [
	  cmpGraphRef ">foobar *foobar" [(0, 1)]
	, cmpGraphRef ">foobar |foobar *foobar" [(0, 2), (1, 2)]
	, cmpGraphRef "+A >foobar +B >barbaz *foobar *barbaz" [(1, 4), (3, 5)]
	, cmpGraphRef "*foobar >foobar" [(1, 0)]

Nonexistent references do not create an edge.

	, cmpGraphRef ">foobar *barbaz" []

References can use amounts and units.

	, cmpGraphRef ">foobar *1 _ foobar *2 _ foobar" [(0, 1), (0, 2)]

There are a few cases that do not make sense here (like loops or multiple results with the same name). They are permitted at this stage, but rejected later.

	, cmpGraphRef "*foobar |foobar >foobar" [(1, 0), (2, 0)]
	, cmpGraphRef "|foobar *foobar >foobar *foobar" [(0, 1), (0, 3), (2, 1), (2, 3)]
	]

6.2 Appendix

runGraphWith f doc expect = sort edges ~?= sort expect
	where
		(Right op) = (head . extract . snd . unzip) <$> parse ("%pesto " ++ doc)
		nodes = zip [firstNodeId..] op
		edges = f nodes
cmpGraph = runGraphWith toGraph
cmpGraphRef = runGraphWith resolveReferences
type NodeId = Int
type Node a = (NodeId, a)
type Nodes a = [Node a]
type Edge = (NodeId, NodeId)
type Edges = [Edge]
firstNodeId = 0 :: NodeId

Find graph’s root node(s), that is a node without outgoing edges:

 walkRoot nodes edges = let out = nub $ map fst edges
	in filter (\(x, _) -> notElem x out) nodes

Get all nodes with edges pointing towards nodeid

incomingEdges edges (nodeid, _) = filter ((==) nodeid . snd) edges
incomingNodes nodes edges n = map ((!!) nodes . fst) $ incomingEdges edges n
outgoingEdges edges (nodeid, _) = filter ((==) nodeid . fst) edges
outgoingNodes nodes edges n = map ((!!) nodes . snd) $ outgoingEdges edges n
test = ["graph" ~: testGraph, "ref" ~: testRef, "extract" ~: testExtract]
third (_, _, x) = x

7 Linting

Not every graph generated in the previous section is a useful recipe. Some instruction sequences just do not make sense. The tests in this section can detect those. Failing any of them does not render a stream of instructions or graph invalid. They just does not describe a useful recipe. Thus implementations must not generate or export such documents. However they should accept input that fails any of the tests and warn the user about the failure.

Additionally this section provides guidance on how to use the instructions provided by the Pesto language properly.

7.1 Graph properties

  • weakly connected, no dangling nodes/subgraphs
  • acyclic

The graph must have exactly one root node (i.e. a node with incoming edges only). This also requires all results and alternatives to be referenced somewhere. Directives are either consumed when parsing, generating a graph and linting. Otherwise they are dangling as well. Unknown instructions are always dangling.

rootIsResult nodes edges = case walkRoot nodes edges of
	[] -> [LintResult NoRootNode []]
	(_, Result _):[] -> []
	(i, _):[] -> [LintResult NonResultRootNode [i]]
	xs -> [LintResult MoreThanOneRootNode (map fst xs)]

Empty recipes or circular references have no root node:

 testConnectivity = [
 	  cmpLint "" [LintResult NoRootNode [], LintResult NoMetadata []]
 	, cmpLint "*foobar >foobar"
		[LintResult NoRootNode [], LintResult NoMetadata []]
 	, cmpLint "+foobar"
		[LintResult NonResultRootNode [0], LintResult NoMetadata []]

Directives and unknown instructions are dangling and thus root nodes.

 	, cmpLint "invalid %invalid +foo >bar"
 		[LintResult MoreThanOneRootNode [0,1,3], LintResult NoMetadata []]
	]

7.2 Metadata

root node can be alternative too?

The graph’s root node must be a result. It contains yield (amount and unit) and title (object) of the recipe.

 extractMetadata nodes edges = case walkRoot nodes edges of
 		[n@(i, Result q@(Quantity _ _ title))] ->
 			Just $ (i, ("title", MetaStr title))
 				:(i, ("yield", MetaQty q))
 				:foldl f [] (incomingNodes nodes edges n)
		_ -> Nothing
 	where

Additional key-value metadata for the whole recipe can be added as annotations to the root node. If multiple annotations with the same key exist the key maps to a list of those values. Annotations that are unparseable key-value pairs are added as recipe description instead.

		f xs (i, Annotation s) = case parseMetadata s of
				Left _ -> (i, ("description", MetaStr s)):xs
				Right (k, v) -> (i, (k, MetaStr v)):xs
		f xs _ = xs

Key and value are separated by a colon. Keys must not contain whitespace or the colon char. A value may be empty.

parseMetadata = runParser metadata () ""
metadata = let keychars = satisfy (\x -> not (isSpace x) && x /= ':') in (,)
	<$> many1 keychars
	<*> (char ':' *> spaces *> many anyChar)
lintMetadata nodes edges = case extractMetadata nodes edges of
		Just result -> foldl checkKey [] result
		Nothing -> [LintResult NoMetadata []]
	where
		checkKey xs (_, (k, _)) | isKeyKnown k = xs
		checkKey xs (i, _) = LintResult UnknownMetadataKey [i]:xs

Valid metadata keys are listed below. Additionally applications may add keys by prefixing them with “x-myapp-”, thus an application called “basil” adding “some-key” would use the full key “x-basil-some-key”.

		isKeyKnown k = k `elem` knownKeys || "x-" `isPrefixOf` k

The following metadata keys are permitted:

		knownKeys = [

Both, title and description, are implicit.

			  "title"
 			, "description"

The recipe’s language, as 2 character code (ISO 639-1).

			, "language"

Yield and time both must be a quantity.

			, "yield"
			, "time"

An image can be a relative file reference or URI

			, "image"
			, "author"
			]

Check the metadata’s value format. I.e. yield/time must be quantity

For instance a german language recipe for one person would look like this:

 testMetadata = [
 	  cmpLintMeta "+foo >1 ml foobar (language: de) (x-app-key: value)"
		[]
		(Just [(1, ("title", MetaStr "foobar"))
 			, (1, ("yield", MetaQty (Quantity (Exact (AmountRatio (1%1))) "ml" "foobar")))
			, (2, ("language", MetaStr "de"))
 			, (3, ("x-app-key", MetaStr "value"))])

Unparseable annotations or unknown keys are linting errors:

 	, cmpLintMeta "+foo >foobar (unknown-key: value)"
 		[LintResult UnknownMetadataKey [2]]
 		(Just [(1, ("title", MetaStr "foobar"))
			, (1, ("yield", MetaQty (strQuantity "foobar")))
 			, (2, ("unknown-key", MetaStr "value"))])

Root node annotations not containing a parseable key-value pair are assigned the key “description”.

	, cmpLintMeta "+foo >foobar ( some description ) (another one: with colon) (another:     valid key-value)"
		[LintResult UnknownMetadataKey [4]]
		(Just [(1, ("title", MetaStr "foobar"))
			, (1, ("yield", MetaQty (strQuantity "foobar")))
			, (2, ("description", MetaStr " some description "))
			, (3, ("description", MetaStr "another one: with colon"))
			, (4, ("another", MetaStr "valid key-value"))])
	]

7.3 Time is a tool

By definition time is a tool and not an ingredient.

timeUnits = ["s", "min", "h", "d"]

isTime (Quantity _ unit "") | unit `elem` timeUnits = True
isTime _ = False
 timeIsATool nodes _ = foldl f [] nodes
 	where
 		f xs (nodeid, Ingredient q) | isTime q = LintResult TimeIsATool [nodeid]:xs
		f xs _ = xs
testLintQuantity = [
	  cmpLint "+10 min >foo" [LintResult TimeIsATool [0]]
	, cmpLint "+10-12 h >foo" [LintResult TimeIsATool [0]]
	, cmpLint "+90/120 s >foo" [LintResult TimeIsATool [0]]
	, cmpLint "+~12 s >foo" [LintResult TimeIsATool [0]]
	, cmpLint "&10 min [bar] >foo" []
	]

Only actions can be annotated with a time. It can be used to indicate how long a certain action is expected to take (i.e. peeling potatoes takes two minutes) or how long the action is supposed to be executed (i.e. cook five minutes). More time annotations improve the software’s scheduling capabilities.

timeAnnotatesAction nodes edges = foldl f [] nodes
	where
		f xs n@(nodeid, Tool q) | isTime q && (not . allActions) (outgoingEdges edges n) = LintResult TimeAnnotatesAction [nodeid]:xs
		f xs _ = xs
		toNodelist = (!!) nodes . snd
		allActions = all (isAction . snd . toNodelist)

For example “cook 10 minutes” can be expressed with

testTimeAnnotatesAction = [
	  cmpLint "&10 min [cook] >soup" []
	, cmpLint "&10 min [cook] &5-6 h [cook again] >soup" []
	, cmpLint "&10 min >soup" [LintResult TimeAnnotatesAction [0]]
	, cmpLint "&10 min &15 min |time *time [cook] >soup"
		[LintResult TimeAnnotatesAction [0], LintResult TimeAnnotatesAction [1]]
	]

7.4 Well-known units

Units can be an arbitrary strings, but implementations should recognize the common metric units g (gram), l (litre) and m (metre). One of these prefixes may be used with each of them: m (milli-), c (centi-), d (dezi-) and k (kilo-). Additionally time in s (second), min (minute), h (hour), d (day) should be accepted.

 wellKnownUnit nodes _ = foldl f [] nodes
 	where
 		extractQty (Ingredient q) = Just q
 		extractQty (Tool q) = Just q
 		extractQty (Result q) = Just q
 		extractQty (Alternative q) = Just q
 		extractQty (Reference q) = Just q
 		extractQty _ = Nothing
 		f xs (nodeid, instr) | fromMaybe False (extractQty instr >>= (return . not . known)) =
 				LintResult UnitNotWellKnown [nodeid]:xs
		f xs _ = xs
 		known (Quantity _ unit _) = unit `elem` knownUnits
 		knownUnits = [
 				  ""
 				, "mg", "g", "kg"
 				, "ml", "cl", "dl", "l"
 				, "cm", "dm", "m"
 				] ++ timeUnits

Usage of imperial units (inch, pound, …) as well as non-XXX units like “teaspoon”, “cup”, … is discouraged because the former is used by just three countries in the world right now and the latter is language- and country-dependent. The implementation may provide the user with a conversion utility.

  • example: 1 oz ~= 28.349523125 g, can only be approximated by rational number, for instance 29767/1050 g
  • 15 oz would are \(\frac{29767}{70} \mathrm{g} = 425+\frac{17}{70} \mathrm{g}\), since nobody sells 17/70 g the implementation would round down to ~425 g (although <1g is not really enough to justify adding approx)
testLintWellKnownUnit = [
	  cmpLint "+1 in foobar >bar" [LintResult UnitNotWellKnown [0]]
	, cmpLint "+2 teaspoons foobar >bar" [LintResult UnitNotWellKnown [0]]
	, cmpLint "+3 cups foobar >bar" [LintResult UnitNotWellKnown [0]]
	, cmpLint "+1 ml foobar >bar" []
	, cmpLint "+1 cl foobar >bar" []
	, cmpLint "+1 dl foobar >bar" []
	, cmpLint "+1 l foobar >bar" []
	, cmpLint "+2 _ something >bar" []
	, cmpLint "&1 min [foo] >bar" []

The unit is case-sensitive, thus

Should we allow case-insensitive units? References are case-insensitive as well…

	, cmpLint "+1 Mg foobar >bar" [LintResult UnitNotWellKnown [0]]
	, cmpLint "+1 kG foobar >bar" [LintResult UnitNotWellKnown [0]]
	, cmpLint "&1 MIN [foo] >bar" [LintResult UnitNotWellKnown [0]]
	]

7.5 References

All references must be resolved. An earlier check makes sure all results and alternatives are referenced at some point.

 referencesResolved nodes edges = foldl f [] nodes
 	where
 		f xs n@(nodeid, Reference _) | null (incomingEdges edges n) =
 				LintResult UndefinedReference [nodeid]:xs
		f xs _ = xs
testLintRefs = [
	  cmpLint "*foobar >foobar >barbaz" []
	, cmpLint "*foobar >foo" [LintResult UndefinedReference [0]]
	]

A result must have at least one incoming edge. This is a special case and can only occur at the beginning of a recipe.

 resultNonempty nodes edges = foldl f [] nodes
 	where
 		f xs n@(nodeid, Result _) | null (incomingEdges edges n) =
 				LintResult TooFewChildren [nodeid]:xs
		f xs _ = xs
testLintResultNonempty = [
	  cmpLint ">bar *bar >baz" [LintResult TooFewChildren [0]]
	, cmpLint "+A >bar *bar >baz" []
	, cmpLint "+A >bar >foo *bar *foo >baz" []
	]

Alternatives must have at least two incoming edges since a smaller amount would make the alternative pointless.

 twoAlternatives nodes edges = foldl f [] nodes
 	where
		f xs n@(nodeid, Alternative _) | length (incomingEdges edges n) < 2 =
 				LintResult TooFewChildren [nodeid]:xs
		f xs _ = xs
testLintTwoAlternatives = [
	  cmpLint "+A |foo *foo >bar" [LintResult TooFewChildren [1]]
	, cmpLint "+A +B |foo *foo >bar" []

should we allow this? it does not make sense imo

	, cmpLint "+A &B |foo *foo >bar" []
	]
  • reject loops
  • reject multiple results/alternatives with the same name

7.6 Ranges

The first amount of a range ratio must be strictly smaller than the second. This limitation is not enforced for ranges containing strings.

 rangeFromLargerThanTo nodes _ = foldl f [] nodes
 	where
		f xs (nodeid, Ingredient q) | not $ rangeOk q =
 				LintResult RangeFromLargerThanTo [nodeid]:xs
		f xs (nodeid, Reference q) | not $ rangeOk q =
 				LintResult RangeFromLargerThanTo [nodeid]:xs
		f xs _ = xs
 		rangeOk (Quantity (Range (AmountRatio a) (AmountRatio b)) _ _) = a < b
 		rangeOk _ = True
testRangeFromLargerThanTo = [
	  cmpLint "+2-3 l water >bar" []
	, cmpLint "+3-2 l water >bar" [LintResult RangeFromLargerThanTo [0]]
	, cmpLint "+2/3-1/3 l water >bar" [LintResult RangeFromLargerThanTo [0]]
	, cmpLint "+some-many _ eggs >bar" []
	, cmpLint "+1-\"a few\" _ eggs >bar" []
	]

7.7 Appendix

 data LintResult = LintResult LintStatus [NodeId] deriving (Show, Eq, Ord)
 data LintStatus =
 	  NoRootNode
 	| NonResultRootNode
 	| MoreThanOneRootNode
 	| UndefinedReference
 	| TooFewChildren
	| TimeIsATool
 	| TimeAnnotatesAction
	| UnitNotWellKnown
 	| InvalidNode
 	| RangeFromLargerThanTo
 	| NoMetadata
 	| UnknownMetadataKey
 	deriving (Show, Eq, Ord)

Every lint test checks a single aspect of the graph.

lint nodes edges = concatMap (\f -> f nodes edges) lintTests
 lintTests = [
 	  rootIsResult
	, referencesResolved
	, resultNonempty
	, twoAlternatives
 	, timeIsATool
 	, timeAnnotatesAction
 	, wellKnownUnit
 	, lintMetadata
 	, rangeFromLargerThanTo
 	]
 withGraph doc f = f nodes edges
	where
 		(Right op) = (head . extract . snd . unzip) <$> parse ("%pesto " ++ doc)
 		nodes = zip [firstNodeId..] op
 		edges = toGraph nodes ++ resolveReferences nodes
cmpLint doc expect = withGraph doc (\nodes edges -> doc ~: sort (lint nodes edges) ~?= sort expect)
data Metadata = MetaQty Quantity | MetaStr String deriving (Show, Eq)
cmpLintMeta doc expectLint expectMeta = withGraph doc (\nodes edges -> doc ~: [
		  sort (lint nodes edges) ~?= sort expectLint
		, extractMetadata nodes edges ~?= expectMeta
		])
strQuantity = Quantity (Exact (AmountStr "")) ""
test = [
	  testConnectivity
	, testMetadata
	, testLintRefs
	, testLintQuantity
	, testLintWellKnownUnit
	, testTimeAnnotatesAction
	, testLintTwoAlternatives
	, testLintResultNonempty
	, testRangeFromLargerThanTo
	]

8 Serializing

class Serializeable a where
	serialize :: a -> String
  • Add instance for graph
  • use \(\mathcal{O}(1)\) string builder

Finally transform linear stream of instructions into a string again:

instance Serializeable a => Serializeable [a] where
	serialize ops = unlines $ map serialize ops
instance Serializeable Instruction where
	serialize (Annotation s) = quote '(' ')' s
	serialize (Ingredient q) = '+':serialize q
	serialize (Tool q) = '&':serialize q
	serialize (Action s) = quote '[' ']' s
	serialize (Reference q) = '*':serialize q
	serialize (Result q) = '>':serialize q
	serialize (Alternative q) = '|':serialize q
	serialize (Directive s) = '%':serializeQstr s
	serialize (Unknown s) = s
instance Serializeable Quantity where
	serialize (Quantity a b "") = serialize a ++ " " ++ serializeQstr b
	serialize (Quantity (Exact (AmountStr "")) "" c) = serializeQstr c
	serialize (Quantity a "" c) = serialize a ++ " _ " ++ serializeQstr c
	serialize (Quantity a b c) = serialize a ++ " " ++ serializeQstr b ++ " " ++ serializeQstr c
instance Serializeable Approximately where
	serialize (Range a b) = serialize a ++ "-" ++ serialize b
	serialize (Approx a) = '~':serialize a
	serialize (Exact a) = serialize a

There are two special cases here, both for aesthetic reasons:

  1. If the denominator is one we can just skip printing it, because \(\frac{2}{1} = 2\) and
  2. if the numerator is larger than the denominator use mixed fraction notation, because \(\frac{7}{2} = 3+\frac{1}{2}\)
instance Serializeable Amount where
	serialize (AmountRatio a) | denominator a == 1 = show (numerator a)
	serialize (AmountRatio a) | numerator a > denominator a =
		show full ++ "/" ++ show num ++ "/" ++ show denom
		where
			full = numerator a `div` denom
			num = numerator a - full * denom
			denom = denominator a
	serialize (AmountRatio a) = show (numerator a) ++ "/" ++ show (denominator a)
	serialize (AmountStr s) = serializeQstr s
serializeQstr "" = "_"
serializeQstr s | (not . isLetter . head) s || hasSpaces s = quote '"' '"' s
serializeQstr s = s
hasSpaces = any isSpace
quote start end s = [start] ++ concatMap (\c -> if c == end then ['\\', end] else [c]) s ++ [end]

9 Using this project

This project uses cabal. It provides the Codec.Pesto library that implements the Pesto language as described in the previous sections. It also comes with three binaries.

9.1 User interface

The user-interface has different modes of operation. All of them read a single recipe from the standard input.

 main = do
 	(op:_) <- getArgs
 	s <- getContents
	either malformedRecipe (run op) (parse s)
run "dot" = runDot
run "metadata" = runMeta
run "ingredients" = runIngredients
run _ = const (putStrLn "unknown operation")
malformedRecipe = print
streamToGraph stream = (nodes, edges)
	where
		doc = (head . extract . snd . unzip) stream
		nodes = zip [firstNodeId..] doc
		edges = toGraph nodes ++ resolveReferences nodes

9.1.1 dot

Since each recipe is just a directed graph (digraph), GraphViz’ dot language can represent recipes as well. Example:

cabal run --verbose=0 pesto dot < spaghetti.pesto | dot -Tpng > spaghetti.png
 runDot stream = putStrLn $ toDot dotNodes dotEdges
 	where
 		(nodes, edges) = streamToGraph stream
 		maxId = (maximum $ map fst nodes) + 1
 		(lintNodes, lintEdges) = unzip $ map (uncurry lintToNodesEdges)
 				$ zip [maxId..] (lint nodes edges)
 		dotNodes = concat [
				  [("node", [("fontname", "Roboto Semi-Light")])]
 				, map (\(a, label) -> (show a, [("label", serialize label)])) nodes
 				, lintNodes
				]
 		dotEdges = concat [
 				  map (both show) edges
 				, concat lintEdges
 				]
lintToNodesEdges nodeid (LintResult t nodes) = let
	n = (show nodeid, [("label", show t), ("color", "red")])
	e = map (\i -> both show (nodeid, i)) nodes
	in (n, e)
both f (a, b) = (f a, f b)
 toDot nodes edges = "digraph a {"
 		<> mconcat (map nodeToDot nodes)
 		<> mconcat (map edgeToDot edges)
 		<> "}"
 	where
 		edgeToDot (a, b) = a <> " -> " <> b <> ";"
		nodeToDot (a, b) = a <> " [" <> mconcat (mapToDot b) <> "];"
mapToDot = map kvToDot
kvToDot (k, v) = k <> "=\"" <> quoteString v <> "\""
quoteString s = mconcat $ map quoteChar s
quoteChar '\n' = "\\n"
quoteChar '"' = "\\\""
quoteChar x = [x]

9.1.2 metadata

Print metadata as key-value pairs, separated by =.

runMeta stream = maybe (return ()) (mapM_ printMeta) $ uncurry extractMetadata $ streamToGraph stream

9.1.3 ingredients

Extract ingredients and print them in CSV format. This does not take alternatives into account yet.

runIngredients stream = mapM_ (putStrLn . csvQty) $ reverse $ foldl getIngredient [] stream
	where
		getIngredient xs (_, Ingredient q) = q:xs
		getIngredient xs _ = xs
printMeta (_, (key, MetaStr value)) = putStrLn $ key ++ "=" ++ value
printMeta (_, (key, MetaQty q)) = putStrLn $ key ++ "=" ++ csvQty q
csvQty (Quantity a b c) = intercalate "," [serialize a, b, c]

9.2 Running tests

The testcases can be run with cabal test. This runs all testcases from all modules and prints a summary.

main = runTestTT tests >>= \c -> if errors c + failures c > 0 then exitFailure else exitSuccess
tests = TestList [
	  "parse" ~: Codec.Pesto.Parse.test
	, "graph" ~: Codec.Pesto.Graph.test
	, "lint" ~: Codec.Pesto.Lint.test
	]

9.3 Building documentation

The HTML documentation is generated directly from the source code of Codec.Pesto by running cabal run pesto-doc. That module serves as starting point and it includes the other modules in a sensible order. Pandoc renders the restructuredText to HTML. We use a slightly modified template.

main = do
	tpl <- readFile "template.html"
	doc <- readWithInclude "src/Codec/Pesto.lhs"
	writeFile "_build/index.html" $ rstToHtml tpl doc

Since pandoc currently does not support restructured text’s include directive directly, emulate it by recursively replacing all lines starting with .. include:: with the referenced file’s contents.

 readWithInclude f = do
 	c <- readFile f
 	let l = lines c
 	linc <- mapM (\line -> case stripPrefix ".. include:: " line of
		Just incfile -> readWithInclude $ replaceFileName f incfile
		Nothing -> return line) l
 	return $ unlines linc

The resulting string is parsed as literate Haskell with restructuredText markup.

readDoc = readRST def {
	  readerExtensions = S.fromList [
		  Ext_literate_haskell
		, Ext_implicit_header_references
		]
	, readerStandalone = True }

Module definitions and imports should not be visible in the final documentation. They are marked up with the class nodoc and removed from the doctree before transforming it into HTML.

dropNoDoc = topDown f
	where
		f (Div (_, classes, _) _) | "nodoc" `elem` classes = Null
		f x = x
rstToHtml tpl = writeDoc tpl . dropNoDoc . handleError . readDoc
 writeDoc tpl = writeHtmlString def {
 	  writerStandalone = True
 	, writerTemplate = tpl
 	, writerHtml5 = True
 	, writerHighlight = True
 	, writerHighlightStyle = tango
	, writerNumberSections = True
 	, writerSectionDivs = True
 	, writerTabStop = 4
 	, writerHTMLMathMethod = MathJax "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"
 	, writerVariables = [("css", "pesto.css"), ("lang", "en")]
 	}

10 Final words

Do we even need this?