Pesto specification draft

Pesto is a text-based human-editable and machine-transformable cooking recipe interchange format.

Warning

This specification is work-in-progress and thus neither stable, consistent or complete.

1 About this document

This section contains various information about this document. The second section motivates why inventing another file format is necessary, followed by the goals of Pesto. After a short Pesto primer intended for the casual user the language’s syntax and semantics are presented. The linting section limits the language to useful cooking recipes. Examples for user presentation of recipes and serialization follow.

Being a literate program this document is specification and reference implementation at the same time. The code is written in Haskell and uses the parsec parser combinator library, as well as HUnit for unit tests. Even without knowing Haskell’s syntax you should be able to understand this specification. There’s a description above every code snippet explaining what is going on.

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

Version

1-draft

License

CC0

Website

https://6xq.net/pesto/

Discussion

https://github.com/PromyLOPh/pesto

Contributors

2 Motivation

The landscape of recipe interchange formats is quite fragmented. First of all there’s HTML microdata. Google rich snippets, which are equivalent to the schema.org microdata vocabulary, are widely used by commercial recipe sites. Although the main objective of microdata is to make content machine-readable most sites will probably use it, because it is considered a search-engine optimization (SEO). Additionally parsing HTML pulled from the web is a nightmare and thus not a real option for sharing recipes. h-recipe provides a second vocabulary that has not been adopted widely yet.

Most cooking-related software comes with its own recipe file format. Some of them, due to their age, can be imported by other programs.

Meal-Master is one of these widely supported formats. A huge trove of recipe files is available in this format. There does not seem to be any official documentation for the format, but inofficial ABNF grammar and format description exist. A Meal-Master recipe template might look like this:

---------- Recipe via Meal-Master (tm)

      Title: <Title>
 Categories: <Categories>
      Yield: <N servings>

    <N> <unit> <ingredient>
    …

-------------------------------<Section name>-----------------------------
  <More ingredients>

  <Instructions>

-----

Rezkonv aims to improve the Mealmaster format by lifting some of its character limits, adding new syntax and translating it to german. However the specification is available on request only.

A second format some programs can import is MasterCook’s MXP file format, as well as its XML-based successor MX2. And then there’s a whole bunch of more-or-less proprietary formats:

Living Cookbook

Uses a XML-based format called fdx version 1.1. There’s no specification to be found, but a few examples are available and those are dated 2006.

My CookBook

Uses the file extension .mcb. A specification is available.

KRecipes

Uses its own export format. However there is no documentation whatsoever.

Gourmet

The program’s export format suffers from the same problem. The only document available is the DTD.

CookML

Last updated in 2006 (version 1.0.4) for the german-language shareware program Kalorio has a custom and restrictive licence that requires attribution and forbids derivate works.

Paprika

Cross-platform application, supports its own “emailed recipe format” and a simple YAML-based format.

Between 2002 and 2005 a bunch of XML-based exchange formats were created. They are not tied to a specific software, so none of them seems to be actively used nowadays:

RecipeML

Formerly known as DESSERT and released in 2002 (version 0.5). The license requires attribution and – at the same time – forbids using the name RecipeML for promotion without written permission.

eatdrinkfeelgood

Version 1.1 was released in 2002 as well, but the site is not online anymore. The DTD is licensed under the CC by-sa license.

REML

Released in 2005 (version 0.5), aims to improve support for commercial uses (restaurant menus and cookbooks). The XSD’s license permits free use and redistribution, but the reference implementation has no licensing information.

RecipeBook XML

Released 2005 as well and shared unter the terms of CC by-sa is not available on the web any more.

Finally, a few non-XML or obscure exchange formats have been created in the past: YumML is an approach similar to those listed above, but based on YAML instead of XML. The specification has been removed from the web and is available through the Web Archive only.

Cordon Bleu (1999) encodes recipes as programs for a cooking machine and defines a Pascal-like language. Being so close to real programming languages Cordon Bleu is barely useable by anyone except programmers. Additionally the language is poorly-designed, since its syntax is inconsistent and the user is limited to a set of predefined functions.

Finally there is RxOL, created in 1985. It constructs a graph from recipes written down with a few operators and postfix notation. It does not separate ingredients and cooking instructions like every other syntax introduced before. Although Pesto is not a direct descendant of RxOL both share many ideas.

microformats.org has a similar list of recipe interchange formats.

3 Goals

First of all recipes are written by humans for humans. Thus a human-readable recipe interchange format is not enough. The recipes need to be human-editable without guidance like a GUI or assistant. That’s why, for instance, XML is not suitable and the interchange formats listed above have largely failed to gain traction. XML, even though simple itself, is still too complicated for the ordinary user. Instead a format needs to be as simple as possible, with as little markup as possible. A human editor must be able to remember the entire syntax. This works best if the file contents “make sense”. A good example for this is Markdown.

We also have to acknowledge that machines play an important role in our daily life. They can help us, the users, accomplish our goals if they are able to understand the recipes as well. Thus they too need to be able to read and write recipes. Again, designing a machine-readable format is not enough. Recipes must be machine-transformable. A computer program should be able to create a new recipe from two existing ones, look up the ingredients and tell us how many joules one piece of that cake will have. And so on.

That being said, Pesto does not aim to carry additional information about ingredients or recipes itself. Nutrition data for each ingredient should be maintained in a separate database. Due to its minimal syntax Pesto is also not suitable for extensive guides on cooking or the usual chitchat found in cooking books.

4 Introduction by example

So let’s start by introducing Pesto by example. This text does not belong
to the recipe and is ignored by any software. The following line starts the
recipe:

%pesto

&pot
+1 l water
+salt
[boil]

+100 g penne
&10 min
[cook]

>1 serving pasta
(language: en)

And that’s how you make pasta: Boil one liter of water in a pot with a little bit of salt. Then add 100 g penne, cook them for ten minutes and you get one serving pasta. That’s all.

There’s more syntax available to express alternatives (either penne or tagliatelle), ranges (1–2 l water or approximately 1 liter water) and metadata. But now you can have a first peek at my own recipe collection.

5 Language syntax

Pesto parses UTF-8 encoded input data consisting of space-delimited instructions. Every character within the Unicode whitespace class is considered a space.

The following instructions are supported:

The pesto grammar has two instruction types: The first one begins with a start symbol (start) and consumes any character up to and including a terminating symbol (end), which can be escaped with a backslash (\).

Annotations and actions both are of this kind:

Here are examples for both:

The second one starts with one identifying character, ignores the following whitespace characters and then consumes an object or a quantity.

Additionally there are two special instructions. Directives are similar to the previous instructions, but consume a qstr.

Unknown instructions are the fallthrough-case and accept anything. They must not be discarded at this point. The point of accepting anything is to fail as late as possible while processing input. This gives the parser a chance to print helpful mesages that provide additional aid to the user who can then fix the problem.

Below are examples for these instructions:

5.1 Qstr

Before introducing quantities we need to have a look at qstr, which is used by them. A qstr, short for quoted string, can be – you guessed it already – a string enclosed in double quotes, a single word or the underscore character that represents the empty string.

A word always starts with a letter, followed by any number of non-space characters.

The empty string can be represented by two double quotes or the underscore, but not the empty string itself.

Any Unicode character with a General_Category major class L (i.e. a letter, see Unicode standard section 4.5 for example) is accected as first character of a word. That includes german umlauts as well as greek or arabic script. Numbers, separators, punctuation and others are not permitted.

The remaining letters of a word can be any character, including symbols, numbers, …

…but not spaces.

If a string contains spaces or starts with a special character it must be enclosed in double quotes.

Double quotes within a string can be quoted by prepending a backslash. However the usual escape codes like \n, \t, … will not be expanded.

5.2 Quantity

The instructions Ingredient, Tool and Reference accept a quantity, that is a triple of Approximately, Unit and Object as parameter.

The syntactic construct is overloaded and accepts one to three arguments. If just one is given it is assumed to be the Object and Approximately and Unit are empty. Two arguments set Approximately and Unit, which is convenient when the unit implies the object (minutes usually refer to the object time, for example).

The first two are equivalent to

Missing units must not be ommited. The version with underscore should be prefered.

Units and objects are just strings. However units should be limited to well-known metric units and some guidelines apply to Objects as well.

Approximately is a wrapper for ranges, that is two amounts separated by a dash, approximate amounts, prepended with a tilde and exact amounts without modifier.

Amounts are limited to rational numbers and strings. There are no real numbers by design and implementations should avoid representing rational numbers as IEEE float. They are not required and introduce ugly corner cases when rounding while converting units for example.

Rational numbers can be an integral, numerator and denominator, each separated by a forward slash, just the numerator and denominator, again separated by a forward slash or just a numerator with the default denominator 1 (i.e. ordinary integral number).

These are all equal.

XXtwo is num and denom

three is int, num and denom

Can be used with ranges and approximate too. and mixed with strings

5.3 Appendix

Test helpers:

A generic parser error:

Compare output of parser f for string str with expected. The expected result can be a parser error, which matches any actual parse error (first case).

Wrap qstr test in AmountStr to aid serialization test

6 Language semantics

The parser’s output, a stream of instructions, may contain multiple recipes. A recipe must start with the directive “pesto” and may end with “buonappetito”. This function extracts all recipes from the stream and removes both directives.

Start and end directive are removed from the extracted instructions. The directive “buonappetito” is optional at the end of a stream.

Instructions surrounding the start and end directive are removed.

The stream may contain multiple recipes. The start directive also ends the previous recipe and starts a new one.

Each recipe’s stream of instructions drives a stack-based machine that transforms it into a directed graph. Think of the stack as your kitchen’s workspace that is used to prepare the food’s components. You can add new ingredients, perform actions on them, put them aside and add them again.

This function processes a list of nodes, that is instructions uniquely identified by an integer and returns the edges of the directed graph as a list of tuples.

Ingredients are simply added to the current workspace. They should for example appear on the shopping list.

The same happens for for tools. However they are not part of the final product, but used in the process of making it. For instance they do not appear on the shopping list. Time is a tool.

Actions take all ingredients and tools currently on the workspace, perform some action with them and put the product back onto the workspace.

Results add a label to the current workspace’s contents and move them out of the way. It should be a meaningful name, not just A and B obviously. Consecutive Results add different labels to the same workspace. That’s useful when an action yields multiple results at once that are processed in different ways.

Alternatives too add a label to the current workspace’s content, but they pick one of things on the workspace and throw everything else away. This allows adding optional or equivalent ingredients to a recipe (i.e. margarine or butter).

References are similar to ingredients. They are used to add items from a workspace labeled with Result or Alternative. More on that in the next section.

Annotations add a description to any of the previous instructions. They can be used to provide more information about ingredients (so “hot water” becomes “+water (hot)”, tools (“&oven (200 °C)”) or actions (“[cook] (XXX)”).

Unused directives or unknown instructions are danging nodes with no connection to other nodes.

These are helper functions:

Here are a few example of how this stack-machine works. Each edge is a tuple of two integer numbers. These are the nodes it connects, starting with zero. Ingredient, Tool and Reference itself do not create any edges:

But Action, Alternative and Result do in combination with them:

If the stack is empty, i.e. it was cleared by a Result or Alternative instruction, consecutive results or alternatives operate on the previous, non-empty stack.

Unless that stack too is empty. Then they do nothing:

The Annotation instruction always creates an edge to the most-recently processed node that was not an annotation. Thus two consecutive annotations create edges to the same node.

Unknown directives or instructions are never connected to other nodes.

6.1 References

Results and alternatives can be referenced with the Reference instruction. Resolving these references does not happen while buiding the graph, but afterwards. This allows referencing an a result or alternative before its definition with regard to the their processing order.

Resolving references is fairly simple: For every reference its object name a case-insensitive looked is performed in a table containing all results and alternatives. If it succeeds an edge from every result and alternative returned to the reference in question is created.

References works before or after the result instruction.

Nonexistent references do not create an edge.

References can use amounts and units.

There are a few cases that do not make sense here (like loops or multiple results with the same name). They are permitted at this stage, but rejected later.

6.2 Appendix

Find graph’s root node(s), that is a node without outgoing edges:

Get all nodes with edges pointing towards nodeid

7 Linting

Not every graph generated in the previous section is a useful recipe. Some instruction sequences just do not make sense. The tests in this section can detect those. Failing any of them does not render a stream of instructions or graph invalid. They just does not describe a useful recipe. Thus implementations must not generate or export such documents. However they should accept input that fails any of the tests and warn the user about the failure.

Additionally this section provides guidance on how to use the instructions provided by the Pesto language properly.

7.1 Graph properties

  • weakly connected, no dangling nodes/subgraphs
  • acyclic

The graph must have exactly one root node (i.e. a node with incoming edges only). This also requires all results and alternatives to be referenced somewhere. Directives are either consumed when parsing, generating a graph and linting. Otherwise they are dangling as well. Unknown instructions are always dangling.

Empty recipes or circular references have no root node:

Directives and unknown instructions are dangling and thus root nodes.

7.2 Metadata

root node can be alternative too?

The graph’s root node must be a result. It contains yield (amount and unit) and title (object) of the recipe.

Additional key-value metadata for the whole recipe can be added as annotations to the root node. If multiple annotations with the same key exist the key maps to a list of those values. Annotations that are unparseable key-value pairs are added as recipe description instead.

Key and value are separated by a colon. Keys must not contain whitespace or the colon char. A value may be empty.

Valid metadata keys are listed below. Additionally applications may add keys by prefixing them with “x-myapp-”, thus an application called “basil” adding “some-key” would use the full key “x-basil-some-key”.

The following metadata keys are permitted:

Both, title and description, are implicit.

The recipe’s language, as 2 character code (ISO 639-1__).

__ http://www.loc.gov/standards/iso639-2/php/English_list.php

Yield and time both must be a quantity.

An image can be a relative file reference or URI

Check the metadata’s value format. I.e. yield/time must be quantity

For instance a german language recipe for one person would look like this:

Unparseable annotations or unknown keys are linting errors:

Root node annotations not containing a parseable key-value pair are assigned the key “description”.

7.3 Time is a tool

By definition time is a tool and not an ingredient.

Only actions can be annotated with a time. It can be used to indicate how long a certain action is expected to take (i.e. peeling potatoes takes two minutes) or how long the action is supposed to be executed (i.e. cook five minutes). More time annotations improve the software’s scheduling capabilities.

For example “cook 10 minutes” can be expressed with

7.4 Well-known units

Units can be an arbitrary strings, but implementations should recognize the common metric units g (gram), l (litre) and m (metre). One of these prefixes may be used with each of them: m (milli-), c (centi-), d (dezi-) and k (kilo-). Additionally time in s (second), min (minute), h (hour), d (day) should be accepted.

Usage of imperial units (inch, pound, …) as well as non-XXX units like “teaspoon”, “cup”, … is discouraged because the former is used by just three countries in the world right now and the latter is language- and country-dependent. The implementation may provide the user with a conversion utility.

  • example: 1 oz ~= 28.349523125 g, can only be approximated by rational number, for instance 29767/1050 g
  • 15 oz would are \(\frac{29767}{70} \mathrm{g} = 425+\frac{17}{70} \mathrm{g}\), since nobody sells 17/70 g the implementation would round down to ~425 g (although <1g is not really enough to justify adding approx)

The unit is case-sensitive, thus

Should we allow case-insensitive units? References are case-insensitive as well…

7.5 References

All references must be resolved. An earlier check makes sure all results and alternatives are referenced at some point.

A result must have at least one incoming edge. This is a special case and can only occur at the beginning of a recipe.

Alternatives must have at least two incoming edges since a smaller amount would make the alternative pointless.

should we allow this? it does not make sense imo

  • reject loops
  • reject multiple results/alternatives with the same name

7.6 Ranges

The first amount of a range ratio must be strictly smaller than the second. This limitation is not enforced for ranges containing strings.

7.7 Appendix

Every lint test checks a single aspect of the graph.

8 Serializing

  • Add instance for graph
  • use \(\mathcal{O}(1)\) string builder

Finally transform linear stream of instructions into a string again:

There are two special cases here, both for aesthetic reasons:

  1. If the denominator is one we can just skip printing it, because \(\frac{2}{1} = 2\) and
  2. if the numerator is larger than the denominator use mixed fraction notation, because \(\frac{7}{2} = 3+\frac{1}{2}\)

9 Using this project

This project uses cabal. It provides the Codec.Pesto library that implements the Pesto language as described in the previous sections. It also comes with three binaries.

9.1 User interface

The user-interface has different modes of operation. All of them read a single recipe from the standard input.

9.1.1 dot

Since each recipe is just a directed graph (digraph), GraphViz’ dot language can represent recipes as well. Example:

9.1.2 metadata

Print metadata as key-value pairs, separated by =.

9.1.3 ingredients

Extract ingredients and print them in CSV format. This does not take alternatives into account yet.

9.2 Running tests

The testcases can be run with cabal test. This runs all testcases from all modules and prints a summary.

9.3 Building documentation

The documentation can be generated running cabal run pesto-doc. It is exclusively based on the restructuredText inside this packages’ literal Haskell source code.

Pandoc outputs a single HTML5 page with syntax highlighting and MathJax for formulas.

A slightly customized template is used.

The module Codec.Pesto serves as starting point and it includes every other module in a sensible order. For the relative includes to work, we need to change our current working directory.

Output is written to the directory _build, which contains the corresponding stylesheet.