diff options
author | Lars-Dominik Braun <lars@6xq.net> | 2015-06-14 20:35:08 +0200 |
---|---|---|
committer | Lars-Dominik Braun <lars@6xq.net> | 2015-06-14 20:35:08 +0200 |
commit | e92c82e2c9ff541cd321ad7a8aedcf34e615197c (patch) | |
tree | fd07d24164450f25a224eb593922e4a4926d062b /src/Codec | |
download | pesto-e92c82e2c9ff541cd321ad7a8aedcf34e615197c.tar.gz pesto-e92c82e2c9ff541cd321ad7a8aedcf34e615197c.tar.bz2 pesto-e92c82e2c9ff541cd321ad7a8aedcf34e615197c.zip |
First public version
Diffstat (limited to 'src/Codec')
-rw-r--r-- | src/Codec/Pesto.lhs | 187 | ||||
-rw-r--r-- | src/Codec/Pesto/Dot.lhs | 24 | ||||
-rw-r--r-- | src/Codec/Pesto/Graph.lhs | 206 | ||||
-rw-r--r-- | src/Codec/Pesto/Lint.lhs | 323 | ||||
-rw-r--r-- | src/Codec/Pesto/Parse.lhs | 397 | ||||
-rw-r--r-- | src/Codec/Pesto/Parse.lhs-boot | 20 | ||||
-rw-r--r-- | src/Codec/Pesto/Serialize.lhs | 69 |
7 files changed, 1226 insertions, 0 deletions
diff --git a/src/Codec/Pesto.lhs b/src/Codec/Pesto.lhs new file mode 100644 index 0000000..c35d15f --- /dev/null +++ b/src/Codec/Pesto.lhs @@ -0,0 +1,187 @@ +========================= +Pesto specification draft +========================= + +Pesto is a text-based human-editable and machine-transformable cooking recipe +interchange format. + +.. class:: nodoc + +> module Codec.Pesto where + +About this document +------------------- + +This section contains various information about this document. The `second +section <motivation_>`_ motivates why inventing another file format is +necessary, followed by the goals_ of Pesto. After a short Pesto `primer +<introduction-by-example_>`_ intended for the casual user the language’s +`syntax <language-syntax_>`_ and `semantics <language-semantics_>`_ are +presented. The `linting section <linting_>`_ limits the language to useful +cooking recipes. Examples for user presentation of recipes and serialization +follow. + +Being a literate program this document is specification and reference +implementation at the same time. The code is written in Haskell_ and uses the +parsec_ parser combinator library, as well as HUnit_ for unit tests. Even +without knowing Haskell’s syntax you should be able to understand this +specification. There’s a description above every code snippet explaining what +is going on. + +.. _Haskell: http://learnyouahaskell.com/ +.. _HUnit: http://hackage.haskell.org/package/HUnit +.. _parsec: http://hackage.haskell.org/package/parsec + +The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, +“SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be +interpreted as described in `RFC 2119`_. + +.. _RFC 2119: http://tools.ietf.org/html/rfc2119 + +:Version: 1-draft +:License: CC0_ +:Website: https://6xq.net/pesto/ +:Discussion: https://github.com/PromyLOPh/pesto +:Contributors: + - `Lars-Dominik Braun <mailto:lars+pesto@6xq.net>`_ + +.. _CC0: https://creativecommons.org/publicdomain/zero/1.0/ + +.. _motivation: + +Motivation +---------- + +The landscape of recipe interchange formats is quite fragmented. First of all +there’s HTML microdata. `Google rich snippets`_, which are equivalent to the +schema.org_ microdata vocabulary, are widely used by commercial recipe sites. +Although the main objective of microdata is to make content machine-readable +most sites will probably do so, because it is considered a search-engine +optimization (SEO). Additionally parsing HTML pulled from the web is a +nightmare and thus not a real option for sharing recipes. h-recipe_ provides a +second vocabulary, but has not been adopted widely. + +.. _Google rich snippets: https://developers.google.com/structured-data/rich-snippets/recipes +.. _schema.org: http://schema.org/Recipe +.. _h-recipe: http://microformats.org/wiki/h-recipe + +Mealmaster_, an ancient file format with – due to its age – many problems, +seems to have the most traction right now. A large amount of recipe files is +`available in this format <http://www.ffts.com/recipes.htm>`_. Rezkonv_ aims to +improve the Mealmaster format. However the specification is available on +request only. Another text-based format, MXP_ (plus MX2, MZ2), is used by +Mastercook. + +A newer format, YumML_, is based on YAML. The specification has been removed +from the web and is available through the Web Archive only. + +.. _xml-formats: + +There’s a number of XML-based formats: +RecipeML_, formerly known as DESSERT was released in 2002 (version 0.5). The +license requires attribution and – at the same time – forbids using the name +RecipeML for promotion without written permission. +REML_ was released in 2005 (version 0.5). It is rather complicated and has no +license. +`RecipeBook XML`_, released in 2005 as well and shared unter the terms of `CC +by-sa`_ is not available on the web any more. +CookML_, created in 2006 (version 1.0.4) for the german-language shareware +program Kalorio has a custom and restrictive licence that requires attribution and +forbids derivate works. +KRecipes_ uses its own export format. However there is no documentation +whatsoever. Gourmet_’s export format suffers from the same problem. The only +document available is the `DTD +<https://github.com/thinkle/gourmet/blob/7715c6ef87ee8c106f0a021972cd70d61d83cadb/data/recipe.dtd>`_. + +.. _CC by-sa: https://creativecommons.org/licenses/by-sa/2.5/ + +All of the formats above share a common design aspect: They split recipes into +two parts, ingredients and instructions. This is quite odd given the nature of +cooking recipes. RxOL_, created in 1985, represents recipes as a graph with +postfix notation and minimal “chitchat”. Although Pesto is not a direct +descendant of RxOL it’s syntax and semantics are quite similar. + +.. _REML: http://reml.sourceforge.net/ +.. _RecipeML: http://www.formatdata.com/recipeml/index.html +.. _CookML: http://www.kalorio.de/index.php?Mod=Ac&Cap=CE&SCa=../cml/CookML_EN +.. _Mealmaster: http://www.wedesoft.de/anymeal-api/mealmaster.html +.. _MXP: http://www.oocities.org/heartland/woods/2073/Appendix.htm +.. _RecipeBook XML: http://web.archive.org/web/20141101132332/http://www.happy-monkey.net/recipebook/ +.. _YumML: http://web.archive.org/web/20140703234140/http://vikingco.de/yumml.html +.. _Rezkonv: http://www.rezkonv.de/software/rksuite/rkformat.html +.. _RxOL: http://www.dodomagnifico.com/641/Recipes/CompCook.html +.. _Gourmet: http://thinkle.github.io/gourmet/ +.. _KRecipes: http://krecipes.sourceforge.net/ + +.. There is a copy at http://diyhpl.us/~bryan/papers2/CompCook.html as well + +.. More interesting stuff: +.. - http://blog.moertel.com/posts/2010-01-08-a-formal-language-for-recipes-brain-dump.html +.. - http://www.dangermouse.net/esoteric/chef.html + +Goals +----- + +First of all recipes are written *by* humans *for* humans. Thus a +human-readable recipe interchange format is not enough. The recipes need to be +human-editable without guidance like a GUI or assistant. That’s why, for +instance, XML is not suitable and the interchange formats listed `above +<xml-formats_>`_ have largely failed to gain traction. XML, even though simple +itself, is still too complicated for the ordinary user. Instead a format needs +to be as simple as possible, with as little markup as possible. A human editor +must be able to remember the entire syntax. This works best if the file +contents “make sense”. A good example for this is Markdown_. + +.. _Markdown: https://daringfireball.net/projects/markdown/syntax + +We also have to acknowledge that machines play an important role in our daily +life. They can help us, the users, accomplish our goals if they are able to +understand the recipes as well. Thus they too need to be able to read and write +recipes. Again, designing a machine-readable format is not enough. Recipes must +be machine-transformable. A computer program should be able to create a new +recipe from two existing ones, look up the ingredients and tell us how many +joules one piece of that cake will have. And so on. + +That being said, Pesto does not aim to carry additional information about +ingredients or recipes itself. Nutrition data for each ingredient should be +maintained in a separate database. Due to its minimal syntax Pesto is also not +suitable for extensive guides on cooking or the usual chitchat found in cooking +books. + +Introduction by example +----------------------- + +So let’s start by introducing Pesto in a XXXnon-formal, XXX way: By example. We +are now going to cook XXX. The following recipe contains all the information +you need to do that. + +.. class:: todo + +do it. + +See https://github.com/PromyLOPh/rezepte for example recipes. + +.. include:: Pesto/Parse.lhs +.. include:: Pesto/Graph.lhs +.. include:: Pesto/Lint.lhs +.. include:: Pesto/Dot.lhs +.. include:: Pesto/Serialize.lhs + +Using this project +------------------ + +This project uses cabal. It provides the Codec.Pesto library that implements +the Pesto language as described in the previous sections. It also comes with +three binaries. + +.. include:: ../Main.lhs +.. include:: ../Test.lhs +.. include:: ../Doc.lhs + +Final words +----------- + +.. class:: todo + +Do we even need this? + diff --git a/src/Codec/Pesto/Dot.lhs b/src/Codec/Pesto/Dot.lhs new file mode 100644 index 0000000..a0ac161 --- /dev/null +++ b/src/Codec/Pesto/Dot.lhs @@ -0,0 +1,24 @@ +Presentation +------------ + +.. class:: nodoc + +> module Codec.Pesto.Dot (toDot) where +> import Codec.Pesto.Serialize (serialize) + +Since each recipe is just a directed graph (digraph), we can use the dot +language to represent it as well. This in turnXXX can be transformed into an +image, for example. + +> toDot nodes edges = unlines $ ["digraph a {\nnode [fontname=\"Roboto Semi-Light\"];"] ++ n ++ e ++ ["}"] +> where +> f (a, b) = show a ++ " -> " ++ show b ++ ";" +> e = map f edges +> n = map (\(a, b) -> show a ++ " [label=\"" ++ dotEncodeString (serialize b) ++ "\"];") nodes +> addcolor = "#e6ee9c" + +> dotEncodeString = concatMap dotEncodeChar +> dotEncodeChar '\n' = "\\n" +> dotEncodeChar '"' = "\\\"" +> dotEncodeChar x = [x] + diff --git a/src/Codec/Pesto/Graph.lhs b/src/Codec/Pesto/Graph.lhs new file mode 100644 index 0000000..4ea2886 --- /dev/null +++ b/src/Codec/Pesto/Graph.lhs @@ -0,0 +1,206 @@ +Language semantics +------------------ + +.. class:: nodoc + +> module Codec.Pesto.Graph ( +> toGraph +> , walkRoot +> , outgoing +> , incoming +> , firstNodeId +> , resolveReferences +> , test +> ) where +> import Data.Char (isSpace, toLower, isLetter) +> import Data.List (sort, nub) +> import Test.HUnit hiding (test) +> +> import Codec.Pesto.Parse hiding (test) + +Pesto’s syntax drives a stack-based machine that transforms the linear stream +of operations generated by the parser into a directed graph. Think of the stack +as your kitchen’s workspace that is used to prepare the food’s +components. You can add new ingredients, perform actions on them, put them +aside and add them again. + +This function processes a list of nodes, that is operations uniquely identified +by an integer and returns the edges of the directed graph as a list of tuples. + +> toGraph nodes = edges +> where +> (_, _, edges) = foldl f (Nothing, [[]], []) nodes + +Ingredients are simply added to the current workspace. They should for example +appear on the shopping list. + +> f ctx (i, Ingredient _) = addToStack ctx i + +The same happens for for tools. However they are not part of the final product, +but used in the process of making it. For instance they do not appear on the +shopping list. `Time is a tool <time-is-a-tool_>`_. + +> f ctx (i, Tool _) = addToStack ctx i + +Actions take all ingredients and tools currently on the workspace, perform some +action with them and put the product back onto the workspace. + +> f (_, stack:sx, edges) (i, Action _) = (Just i, [i]:stack:sx, edgesTo i stack ++ edges) + +Results add a label to the current workspace’s contents and move them out of +the way. It should be a meaningful name, not just A and B obviously. +Consecutive Results add different labels to the same workspace. That’s useful +when an action yields multiple results at once that are processed in different +ways. + +> f ctx (i, Result _) = consumeStack ctx i + +Alternatives too add a label to the current workspace’s content, but they pick +one of things on the workspace and throw everything else away. This allows +adding optional or equivalent ingredients to a recipe (i.e. margarine or butter). + +> f ctx (i, Alternative _) = consumeStack ctx i + +References are similar to ingredients. They are used to add items from a +workspace labeled with Result or Alternative. More on that `in the next section +<references_>`_. + +> f ctx (i, Reference _) = addToStack ctx i + +Annotations add a description to any of the previous operations. They can be +used to provide more information about ingredients (so “hot water” becomes +“+water (hot)”, tools (“&oven (200 °C)”) or actions (“[cook] (XXX)”). + +> f ctx@(Nothing, s, edges) (_, Annotation _) = ctx +> f (Just prev, s, edges) (i, Annotation _) = (Just prev, s, (i, prev):edges) + +These are helper functions: + +> addToStack (_, stack:sx, edges) i = (Just i, (i:stack):sx, edges) +> consumeStack (_, s, edges) i = +> let +> stack = dropWhile null s +> (top:sx) = if null stack then [[]] else stack +> in (Just i, []:top:sx, edgesTo i top ++ edges) +> edgesTo i = map (\x -> (x, i)) + +Here are a few example of how this stack-machine works. Each edge is a tuple of +two integer numbers. These are the nodes it connects, starting with zero. +Ingredient, Tool and Reference itself do not create any edges: + +> testGraph = [ +> cmpGraph "+ketchup &spoon *foobar" [] + +But Action, Alternative and Result do in combination with them: + +> , cmpGraph "+foobar [barbaz]" [(0, 1)] +> , cmpGraph "+foobar |barbaz" [(0, 1)] +> , cmpGraph "+foobar >barbaz" [(0, 1)] +> , cmpGraph "+foobar +B >barbaz" [(0, 2), (1, 2)] +> , cmpGraph "+foobar >barbaz +foobar >barbaz" [(0, 1), (2, 3)] +> , cmpGraph "+foobar [barbaz] +foobar >barbaz" [(0, 1), (1, 3), (2, 3)] +> , cmpGraph "&foobar [barbaz] [C] >D" [(0, 1), (1, 2), (2, 3)] + +If the stack is empty, i.e. it was cleared by a Result or Alternative +operation, consecutive results or alternatives operate on the *previous*, +non-empty stack. + +> , cmpGraph "+foobar >barbaz >C" [(0, 1), (0, 2)] +> , cmpGraph "+foobar |barbaz |C" [(0, 1), (0, 2)] +> , cmpGraph "+foobar >barbaz |C" [(0, 1), (0, 2)] + +Unless that stack too is empty. Then they do nothing: + +> , cmpGraph ">foobar >foobar" [] +> , cmpGraph "|foobar |foobar" [] +> , cmpGraph "(foobar) (foobar)" [] +> , cmpGraph "[foobar]" [] + +The Annotation operation always creates an edge to the most-recently processed +node that was not an annotation. Thus two consecutive annotations create edges +to the same node. + +> , cmpGraph "+foobar (barbaz)" [(1, 0)] +> , cmpGraph "+foobar (barbaz) (C)" [(1, 0), (2, 0)] +> , cmpGraph "+foobar (barbaz) >barbaz" [(1, 0), (0, 2)] +> , cmpGraph "+foobar >barbaz (C)" [(0, 1), (2, 1)] +> , cmpGraph "+foobar |barbaz (C)" [(0, 1), (2, 1)] +> , cmpGraph "*foobar (C)" [(1, 0)] +> ] + +References +++++++++++ + +Results and alternatives can be referenced with the Reference operation. +Resolving these references does not happen while buiding the graph, but +afterwards. This allows referencing an a result or alternative before its +definition with regard to the their processing order. + +Resolving references is fairly simple: For every reference its object name a +case-insensitive looked is performed in a table containing all results and +alternatives. If it succeeds an edge from every result and alternative returned +to the reference in question is created. + +> resolveReferences nodes = foldl f [] nodes +> where +> f edges (i, ref@(Reference _)) = map (\x -> (x, i)) (findTarget nodes ref) ++ edges +> f edges _ = edges + +> findTarget nodes (Reference (Quantity _ _ a)) = map fst $ filter (isTarget a) nodes +> where +> lc = map toLower +> isTarget dest (_, Result x) = lc x == lc dest +> isTarget dest (_, Alternative x) = lc x == lc dest +> isTarget _ _ = False + + +References works before or after the result operation. + +> testRef = [ +> cmpGraphRef ">foobar *foobar" [(0, 1)] +> , cmpGraphRef ">foobar |foobar *foobar" [(0, 2), (1, 2)] +> , cmpGraphRef "+A >foobar +B >barbaz *foobar *barbaz" [(1, 4), (3, 5)] +> , cmpGraphRef "*foobar >foobar" [(1, 0)] + +Nonexistent references do not create an edge. + +> , cmpGraphRef ">foobar *barbaz" [] + +References can use amounts and units. + +> , cmpGraphRef ">foobar *1 _ foobar *2 _ foobar" [(0, 1), (0, 2)] + +There are a few cases that do not make sense here (like loops or multiple +results with the same name). They are permitted at this stage, but rejected +`later <reject-loops_>`_. + +> , cmpGraphRef "*foobar |foobar >foobar" [(1, 0), (2, 0)] +> , cmpGraphRef "|foobar *foobar >foobar *foobar" [(0, 1), (0, 3), (2, 1), (2, 3)] +> ] + +Appendix +++++++++ + +> runGraphWith f doc expect = sort edges ~?= sort expect +> where +> (Right op) = parse ("%pesto-1 " ++ doc) +> nodes = (zip [firstNodeId..] . map snd . operations) op +> edges = f nodes +> cmpGraph = runGraphWith toGraph +> cmpGraphRef = runGraphWith resolveReferences + +> firstNodeId = 0 :: Int + +Find graph’s root node(s), that is a node without outgoing edges: + +> walkRoot nodes edges = let out = nub $ map fst edges +> in filter (\(x, _) -> notElem x out) nodes + +Get all nodes with edges pointing towards nodeid + +> incoming edges (nodeid, _) = filter ((==) nodeid . snd) edges + +> outgoing edges (nodeid, _) = filter ((==) nodeid . fst) edges + +> test = ["graph" ~: testGraph, "ref" ~: testRef] + diff --git a/src/Codec/Pesto/Lint.lhs b/src/Codec/Pesto/Lint.lhs new file mode 100644 index 0000000..b96c9de --- /dev/null +++ b/src/Codec/Pesto/Lint.lhs @@ -0,0 +1,323 @@ +Linting +------- + +.. class:: nodoc + +> module Codec.Pesto.Lint (lint, test, parseMetadata) where +> import Test.HUnit hiding (test) +> import Data.List (sort, isPrefixOf) +> import Control.Applicative ((<*>), (<$>), (<*), (*>)) +> import Text.Parsec hiding (parse) +> import Text.Parsec.Char +> +> import Codec.Pesto.Graph hiding (test) +> import Codec.Pesto.Parse hiding (test) + +Not every graph generated in the previous section is a useful recipe, since +some combinations of operations just do not make sense. The linting test in +this section can detect common errors. Failing any of these tests does not +render a recipe invalid, but *useless*. Thus implementations must not create +such recipes. They may be accepted as input from the user. + +Every lint test checks a single aspect of the graph. + +> lint nodes edges = concatMap (\f -> f nodes edges) lintTests + +Metadata +++++++++ + +.. _recipetitle: +.. _resultsused: + +The graph must have exactly one root node (i.e. a node with incoming edges +only) and it must be a result. The result’s object name is the recipe’s title. +This also requires all results and alternatives to be referenced somewhere. + +> rootIsResult nodes edges = case walkRoot nodes edges of +> [] -> [LintResult NoRootNode []] +> (i, x):[] -> if isResult x then [] else [LintResult NonResultRootNode [i]] +> xs -> [LintResult MoreThanOneRootNode (map fst xs)] + + +Empty recipes or circular references have no root node: + +> testLintMetadata = [ +> cmpLint "" [LintResult NoRootNode []] +> , cmpLint "*foobar >foobar" [LintResult NoRootNode []] +> , cmpLint "+foobar" [LintResult NonResultRootNode [0]] + +This recipe’s title is “Pesto”. + +> , cmpLint "+foobar >Pesto" [] +> ] + +Additional key-value metadata for the whole recipe can be provided by adding +annotations to the the root node. If multiple annotations with the same key +exist the key maps to a list of those values. + +> parseMetadata = runParser metadata () "" +> metadata = (,) +> <$> (char '.' *> many1 (noneOf ":")) +> <*> (char ':' *> spaces1 *> many1 anyChar) + +> rootAnnotations nodes edges = foldl check [] rootIncoming +> where +> rootIncoming = map ((!!) nodes . fst) $ concatMap (incoming edges) $ walkRoot nodes edges +> check xs (i, Annotation s) | "." `isPrefixOf` s = case parseMetadata s of +> (Left _) -> LintResult InvalidMetadata [i]:xs +> (Right (k, v)) -> if isKeyKnown k +> then xs +> else LintResult UnknownMetadataKey [i]:xs +> check xs _ = xs + +.. class:: todo + +reject metadata annotations for non-root nodes + +The valid keys are listed below. Additionally applications may add their own +metadata with “x-appname-keyname”. + +> isKeyKnown k = k `elem` knownKeys || "x-" `isPrefixOf` k + +The following metadata keys are permitted: + +> knownKeys = [ + +The recipe’s language, as 2 character code (ISO 639-1:2002). + +> "language" + +Yield and time both must be a quantity. + +> , "yield" +> , "time" + +An image can be a relative file reference or URI + +> , "image" +> , "author" +> ] + +For instance a german language recipe for one person would look like this: + +> testRootAnnotations = [ +> cmpLint "+foo >foobar (.language: de) (.yield: 1 _ Person) (.x-app-this: value)" [] + +Unparseable annotations or unknown keys are linting errors: + +> , cmpLint "+foo >foobar (.invalid)" [LintResult InvalidMetadata [2]] +> , cmpLint "+foo >foobar (.unknown: value)" [LintResult UnknownMetadataKey [2]] + +Root node annotations not starting with a dot are considered recipe descriptions. + +> , cmpLint "+foo >foobar (some description)" [] +> ] + +.. _time-is-a-tool: + +Time is a tool +++++++++++++++ + +By definition time is a tool and not an ingredient. + +> timeUnits = ["s", "min", "h", "d"] +> +> isTime (Quantity _ unit "") | unit `elem` timeUnits = True +> isTime _ = False + +> timeIsATool nodes edges = foldl f [] nodes +> where +> f xs n@(nodeid, Ingredient q) | isTime q = LintResult TimeIsATool [nodeid]:xs +> f xs _ = xs + +> testLintQuantity = [ +> cmpLint "+10 min >foo" [LintResult TimeIsATool [0]] +> , cmpLint "+10-12 h >foo" [LintResult TimeIsATool [0]] +> , cmpLint "+90/120 s >foo" [LintResult TimeIsATool [0]] +> , cmpLint "+~12 s >foo" [LintResult TimeIsATool [0]] +> , cmpLint "&10 min [bar] >foo" [] +> ] + +Only actions can be annotated with a time. + +> timeAnnotatesAction nodes edges = foldl f [] nodes +> where +> f xs n@(nodeid, Tool q) | isTime q && (not . allActions) (outgoing edges n) = LintResult TimeAnnotatesAction [nodeid]:xs +> f xs _ = xs +> toNodelist = (!!) nodes . snd +> allActions = all (isAction . snd . toNodelist) + +For example “cook 10 minutes” can be expressed with + +> testTimeAnnotatesAction = [ +> cmpLint "&10 min [cook] >soup" [] +> , cmpLint "&10 min [cook] &5-6 h [cook again] >soup" [] +> , cmpLint "&10 min >soup" [LintResult TimeAnnotatesAction [0]] +> , cmpLint "&10 min &15 min |time *time [cook] >soup" +> [LintResult TimeAnnotatesAction [0], LintResult TimeAnnotatesAction [1]] +> ] + +.. _well-known-units: + +Well-known units +++++++++++++++++ + +Units can be an arbitrary strings, but implementations should recognize the +common metric units g (gram), l (litre) and m (metre). One of these prefixes +may be used with each of them: m (milli-), c (centi-), d (dezi-) and k (kilo-). +Additionally time in s (second), min (minute), h (hour), d (day) should be +accepted. + +> wellKnownUnit nodes edges = foldl f [] nodes +> where +> f xs n@(nodeid, Ingredient q) | (not . known) q = +> LintResult UnitNotWellKnown [nodeid]:xs +> f xs n@(nodeid, Tool q) | (not . known) q = +> LintResult UnitNotWellKnown [nodeid]:xs +> f xs _ = xs +> known (Quantity _ unit _) = unit `elem` knownUnits +> knownUnits = [ +> "" +> , "mg", "g", "kg" +> , "ml", "cl", "dl", "l" +> , "cm", "dm", "m" +> ] ++ timeUnits + +Usage of imperial units (inch, pound, …) as well as non-XXX units like +“teaspoon”, “cup”, … is discouraged because the former is used by just three +countries in the world right now and the latter is language- and +country-dependent. The implementation may provide the user with a conversion +utility. + +.. class:: todo + +- example: 1 oz ~= 28.349523125 g, can only be approximated by rational number, for instance 29767/1050 g +- 15 oz would are :math:`\frac{29767}{70} \mathrm{g} = 425+\frac{17}{70} \mathrm{g}`, since nobody sells 17/70 g the implementation would round down to ~425 g (although <1g is not really enough to justify adding approx) + +> testLintWellKnownUnit = [ +> cmpLint "+1 in foobar >bar" [LintResult UnitNotWellKnown [0]] +> , cmpLint "+2 teaspoons foobar >bar" [LintResult UnitNotWellKnown [0]] +> , cmpLint "+3 cups foobar >bar" [LintResult UnitNotWellKnown [0]] +> , cmpLint "+1 ml foobar >bar" [] +> , cmpLint "+1 cl foobar >bar" [] +> , cmpLint "+1 dl foobar >bar" [] +> , cmpLint "+1 l foobar >bar" [] +> , cmpLint "+2 _ something >bar" [] +> , cmpLint "&1 min [foo] >bar" [] + +The unit is case-sensitive, thus + +.. class:: todo + +Should we allow case-insensitive units? References are case-insensitive as +well… + +> , cmpLint "+1 Mg foobar >bar" [LintResult UnitNotWellKnown [0]] +> , cmpLint "+1 kG foobar >bar" [LintResult UnitNotWellKnown [0]] +> , cmpLint "&1 MIN [foo] >bar" [LintResult UnitNotWellKnown [0]] +> ] + +References +++++++++++ + +All references must be resolved. An `earlier check <resultsused_>`_ makes sure +all results and alternatives are referenced at some point. + +> referencesResolved nodes edges = foldl f [] nodes +> where +> f xs n@(nodeid, Reference _) | null (incoming edges n) = +> LintResult UndefinedReference [nodeid]:xs +> f xs _ = xs + +> testLintRefs = [ +> cmpLint "*foobar >foobar >barbaz" [] +> , cmpLint "*foobar >foo" [LintResult UndefinedReference [0]] +> ] + +A result must have at least one incoming edge. This is a special case and can +only occur at the beginning of a recipe. + +> resultNonempty nodes edges = foldl f [] nodes +> where +> f xs n@(nodeid, Result _) | null (incoming edges n) = +> LintResult TooFewChildren [nodeid]:xs +> f xs _ = xs + +> testLintResultNonempty = [ +> cmpLint ">bar *bar >baz" [LintResult TooFewChildren [0]] +> , cmpLint "+A >bar *bar >baz" [] +> , cmpLint "+A >bar >foo *bar *foo >baz" [] +> ] + +Alternatives must have at least two incoming edges since a smaller amount would +make the alternative pointless. + +> twoAlternatives nodes edges = foldl f [] nodes +> where +> f xs n@(nodeid, Alternative _) | length (incoming edges n) < 2 = +> LintResult TooFewChildren [nodeid]:xs +> f xs _ = xs + +> testLintTwoAlternatives = [ +> cmpLint "+A |foo *foo >bar" [LintResult TooFewChildren [1]] +> , cmpLint "+A +B |foo *foo >bar" [] + +.. class:: todo + +should we allow this? it does not make sense imo + +> , cmpLint "+A &B |foo *foo >bar" [] +> ] + +.. _reject-loops: + +.. class:: todo + +- reject loops +- reject multiple results/alternatives with the same name + +Appendix +++++++++ + +> data LintResult a = LintResult LintStatus [a] deriving (Show, Eq, Ord) +> data LintStatus = +> NoRootNode +> | NonResultRootNode +> | MoreThanOneRootNode +> | UndefinedReference +> | TooFewChildren +> | TimeIsATool +> | TimeAnnotatesAction +> | UnitNotWellKnown +> | UnknownMetadataKey +> | InvalidMetadata +> deriving (Show, Eq, Ord) + +> lintTests = [ +> rootIsResult +> , referencesResolved +> , resultNonempty +> , twoAlternatives +> , timeIsATool +> , timeAnnotatesAction +> , wellKnownUnit +> , rootAnnotations +> ] + +> cmpLint doc expect = doc ~: sort (lint nodes edges) ~?= sort expect +> where +> (Right op) = parse ("%pesto-1 " ++ doc) +> nodes = (zip [firstNodeId..] . map snd . operations) op +> edges = toGraph nodes ++ resolveReferences nodes + +> test = [ +> testLintMetadata +> , testRootAnnotations +> , testLintRefs +> , testLintQuantity +> , testLintWellKnownUnit +> , testTimeAnnotatesAction +> , testLintTwoAlternatives +> , testLintResultNonempty +> ] + diff --git a/src/Codec/Pesto/Parse.lhs b/src/Codec/Pesto/Parse.lhs new file mode 100644 index 0000000..745d339 --- /dev/null +++ b/src/Codec/Pesto/Parse.lhs @@ -0,0 +1,397 @@ +Language syntax +--------------- + +.. class:: nodoc + +> module Codec.Pesto.Parse ( +> parse +> , test +> , Operation(..) +> , Quantity(..) +> , Unit(..) +> , Object(..) +> , Approximately(..) +> , Amount(..) +> , Recipe(..) +> , isResult +> , isReference +> , isAlternative +> , isAnnotation +> , isAction +> , spaces1 +> , notspace +> ) where +> import Control.Applicative ((<*>), (<$>), (<*), (*>)) +> import Data.Char (isSpace, toLower, isLetter) +> import Data.Ratio ((%)) +> import Text.Parsec hiding (parse) +> import Text.Parsec.Char +> import Text.ParserCombinators.Parsec.Pos (newPos) +> import Text.ParserCombinators.Parsec.Error (ParseError, Message, +> errorMessages, messageEq, newErrorUnknown) +> import Test.HUnit hiding (test) +> +> import Codec.Pesto.Serialize (serialize) + +XXX: magic should be an operation +XXX: this parser should accept invalid operations + +From the XXXsyntactic point of view a Pesto recipe is just a list of +space-delimited operations. It is encoded with UTF-8_ and starts with a magic +identifier (``%pesto-1``) followed by one or more spaces (spaces1_). Every +character within the Unicode whitespace class is considered a space. + +.. _UTF-8: https://tools.ietf.org/html/rfc3629 + +.. _spaces1: +.. _Recipe: + +> data Recipe = Recipe { +> version :: Integer +> , operations :: [(SourcePos, Operation)] +> } deriving Show +> +> recipe = Recipe +> <$> magic <* spaces1 +> <*> ((,) <$> getPosition <*> operation) `sepEndBy` spaces1 +> <* eof +> <?> "recipe" +> +> spaces1 = many1 space + +The file identifier consists of the string ``%pesto-`` followed by an integral +number and arbitrary non-space characters. They are reserved for future use and +must be ignored by parsers implementing this version of pesto. A byte order +mark (BOM) must not be used. + +> magic = string "%pesto-" *> int <* skipMany notspace <?> "magic" +> notspace = satisfy (not . isSpace) + +.. _Operation: +.. _Ingredient: +.. _Tool: +.. _Result: +.. _Alternative: +.. _Reference: +.. _Annotation: +.. _Action: + +The following *operations* are supported: + +> data Operation = +> Annotation String +> | Ingredient Quantity +> | Tool Quantity +> | Action String +> | Reference Quantity +> | Result Object +> | Alternative Object +> deriving (Show, Eq) +> +> operation = +> try annotation +> <|> try ingredient +> <|> try tool +> <|> try action +> <|> try result +> <|> try alternative +> <|> try reference +> <?> "operation" + +The pesto grammar has two kinds of operations: The first one begins with a +start character and consumes characters up to and including a terminating +symbol (``end``), which can be escaped with a backslash (``\``): + +> betweenEscaped start end = +> char start +> *> many (try (char '\\' *> char end) <|> satisfy (/= end)) +> <* char end + +Annotations and Actions both are of this kind: + +> annotation = Annotation <$> betweenEscaped '(' ')' +> action = Action <$> betweenEscaped '[' ']' + +Here are examples for both: + +> testOpterm = [cmpOperation "(skinless\nboneless)" (Right (Annotation "skinless\nboneless")) +> , cmpOperation "[stir together]" (Right (Action "stir together")) +> , cmpOperation "[stir\\]together]" (Right (Action "stir]together"))] + + +The second one starts with one identifying character, ignores the following +whitespace characters and then consumes an object or a quantity. + +> oparg ident cont = char ident *> spaces *> cont +> ingredient = oparg '+' (Ingredient <$> quantity) +> tool = oparg '&' (Tool <$> quantity) +> result = oparg '>' (Result <$> object) +> alternative = oparg '|' (Alternative <$> object) +> reference = oparg '*' (Reference <$> quantity) + +> testOparg = [ +> cmpOperation "+100 g flour" (Right (Ingredient (Quantity (Exact (AmountRatio (100%1))) "g" "flour"))) + +> , cmpOperation "&oven" (Right (Tool (Quantity (Exact (AmountStr "")) "" "oven"))) +> , cmpOperation ">dough" (Right (Result "dough")) +> , cmpOperation "|trimmings" (Right (Alternative "trimmings")) +> , cmpOperation "*fish" (Right (Reference (Quantity (Exact (AmountStr "")) "" "fish"))) +> , cmpOperation3 "* \t\n 1 _ cheese" (Right (Reference (Quantity (Exact (AmountRatio (1%1))) "" "cheese"))) "*1 _ cheese" +> ] + +Qstr +++++ + +Before introducing quantities we need to have a look at qstr, which is used by +them. A qstr, short for quoted string, can be – you guessed it already – a +string enclosed in double quotes, a single word or the underscore character +that represents the empty string. + +> qstr = try (betweenEscaped '"' '"') +> <|> word +> <|> char '_' *> return "" + +A word always starts with a letter, followed by any number of non-space +characters. + +> word = (:) <$> letter <*> many notspace + +The empty string can be represented by two double quotes or the underscore, but +not the empty string itself. + +> testQstr = [ +> cmpQstr3 "\"\"" (Right "") "_" +> , cmpQstr "_" (Right "") +> , cmpQstr "" parseError + +Any Unicode character with a General_Category major class L (i.e. a letter, see +`Unicode standard section 4.5 +<http://www.unicode.org/versions/Unicode7.0.0/ch04.pdf>`_ for example) is +accected as first character of a word. That includes german umlauts as well as +greek or arabic script. Numbers, separators, punctuation and others are not +permitted. + +> , cmpQstr "water" (Right "water") +> , cmpQstr "Äpfel" (Right "Äpfel") +> , cmpQstr "τυρί" (Right "τυρί") +> , cmpQstr "جبن" (Right "جبن") +> , cmpQstr "1sugar" parseError +> , cmpQstr "+milk" parseError +> , cmpQstr "∀onion" parseError + +The remaining letters of a word can be any character, including symbols, +numbers, … + +> , cmpQstr "rump-roast" (Right "rump-roast") +> , cmpQstr "v1negar" (Right "v1negar") +> , cmpQstr "mush\"rooms" (Right "mush\"rooms") + +…but not spaces. + +> , cmpQstr " tomatoes" parseError +> , cmpQstr "tomatoes " parseError +> , cmpQstr "lemon juice" parseError +> , cmpQstr "sour\tcream" parseError +> , cmpQstr "white\nwine" parseError + +If a string contains spaces or starts with a special character it must be +enclosed in double quotes. + +> , cmpQstr3 "\"salt\"" (Right "salt") "salt" +> , cmpQstr "\"+milk\"" (Right "+milk") +> , cmpQstr "\"soy sauce\"" (Right "soy sauce") +> , cmpQstr "\"1sugar\"" (Right "1sugar") +> , cmpQstr "\"chicken\tbreast\nmeat\"" (Right "chicken\tbreast\nmeat") + +Double quotes within a string can be quoted by prepending a backslash. However +the usual escape codes like \\n, \\t, … will *not* be expanded. + +> , cmpQstr "\"vine\"gar\"" parseError +> , cmpQstr3 "\"vine\\\"gar\"" (Right "vine\"gar") "vine\"gar" +> , cmpQstr "\"oli\\ve oil\"" (Right "oli\\ve oil") +> , cmpQstr "\"oli\\\\\"ve oil\"" (Right "oli\\\"ve oil") +> , cmpQstr3 "\"sal\\tmon\"" (Right "sal\\tmon") "sal\\tmon" +> ] + +Quantity +++++++++ + +The operations Ingredient, Tool and Reference accept a *quantity*, that is a +triple of Approximately, Unit and Object as parameter. + +> data Quantity = Quantity Approximately Unit Object deriving (Show, Eq) + +The syntactic construct is overloaded and accepts one to three arguments. If +just one is given it is assumed to be the Object and Approximately and Unit are +empty. Two arguments set Approximately and Unit, which is convenient when the +unit implies the object (minutes usually refer to the object time, for +example). + +> quantity = try quantityA <|> quantityB + +> quantityA = Quantity +> <$> approximately +> <* spaces1 +> <*> unit +> <*> (try (spaces1 *> object) <|> return "") + +> quantityB = Quantity +> <$> return (Exact (AmountStr "")) +> <*> return "" +> <*> object + +> testQuantityOverloaded = [ +> cmpQuantity "oven" (exactQuantity (AmountStr "") "" "oven") +> , cmpQuantity "10 min" (exactQuantity (AmountRatio (10%1)) "min" "") +> , cmpQuantity "100 g flour" (exactQuantity (AmountRatio (100%1)) "g" "flour") + +The first two are equivalent to + +> , cmpQuantity3 "_ _ oven" (exactQuantity (AmountStr "") "" "oven") "oven" +> , cmpQuantity3 "10 min _" (exactQuantity (AmountRatio (10%1)) "min" "") "10 min" + +Missing units must not be ommited. The version with underscore should be prefered. + +> , cmpQuantity3 "1 \"\" meal" (exactQuantity (AmountRatio (1%1)) "" "meal") "1 _ meal" +> , cmpQuantity "1 _ meal" (exactQuantity (AmountRatio (1%1)) "" "meal") +> ] + +Units and objects are just strings. However units should be limited to +`well-known metric units <well-known-units_>`_ and `some guidelines +<objects-and-annotations_>`_ apply to Objects as well. + +> type Unit = String +> unit = qstr +> +> type Object = String +> object = qstr + +Approximately is a wrapper for ranges, that is two amounts separated by a dash, +approximate amounts, prepended with a tilde and exact amounts without modifier. + +> data Approximately = +> Range Amount Amount +> | Approx Amount +> | Exact Amount +> deriving (Show, Eq) +> +> approximately = try range <|> try approx <|> exact +> range = Range <$> amount <*> (char '-' *> amount) +> approx = Approx <$> (char '~' *> amount) +> exact = Exact <$> amount + +> testQuantityApprox = [ +> cmpQuantity "1-2 _ bananas" (Right (Quantity (Range (AmountRatio (1%1)) (AmountRatio (2%1))) "" "bananas")) +> , cmpQuantity "1 - 2 _ bananas" parseError +> , cmpQuantity "1- 2 _ bananas" parseError +> , cmpQuantity "1 -2 _ bananas" parseError +> , cmpQuantity "~2 _ bananas" (Right (Quantity (Approx (AmountRatio (2%1))) "" "bananas")) +> , cmpQuantity "~ 2 _ bananas" parseError + +> ] + +Amounts are limited to rational numbers and strings. There are no real numbers +by design and implementations should avoid representing rational numbers as +IEEE float. They are not required and introduce ugly corner cases when +rounding while converting units for example. + +> data Amount = +> AmountRatio Rational +> | AmountStr String +> deriving (Show, Eq) +> +> amount = try ratio <|> AmountStr <$> qstr + +> testQuantityAmount = [ +> cmpQuantity "some _ pepper" (exactQuantity (AmountStr "some") "" "pepper") +> , cmpQuantity3 "\"some\"-\"a few\" _ bananas" (Right (Quantity (Range (AmountStr "some") (AmountStr "a few")) "" "bananas")) "some-\"a few\" _ bananas" +> , cmpQuantity "~\"the stars in your eyes\" _ bananas" (Right (Quantity (Approx (AmountStr "the stars in your eyes")) "" "bananas")) +> ] + +Rational numbers can be an integral, numerator and denominator, each separated +by a forward slash, just the numerator and denominator, again separated by a +forward slash or just a numerator with the default denominator 1 (i.e. ordinary +integral number). + +> ratio = let toRatio i num denom = AmountRatio ((i*denom+num)%denom) in +> try (toRatio <$> int <*> (char '/' *> int) <*> (char '/' *> int)) +> <|> try (toRatio <$> return 0 <*> int <*> (char '/' *> int)) +> <|> try (toRatio <$> return 0 <*> int <*> return 1) + +These are all equal. + +> testQuantityRatio = [ +> cmpQuantity "3 _ bananas" (exactQuantity (AmountRatio (3%1)) "" "bananas") +> , cmpQuantity3 "3/1 _ bananas" (exactQuantity (AmountRatio (3%1)) "" "bananas") "3 _ bananas" +> , cmpQuantity3 "3/0/1 _ bananas" (exactQuantity (AmountRatio (3%1)) "" "bananas") "3 _ bananas" + +XXtwo is num and denom + +> , cmpQuantity "3/5 _ bananas" (exactQuantity (AmountRatio (3%5)) "" "bananas") + +three is int, num and denom + +> , cmpQuantity "3/5/7 _ bananas" (exactQuantity (AmountRatio ((3*7+5)%7)) "" "bananas") + +> , cmpQuantity3 "10/3 _ bananas" (exactQuantity (AmountRatio (10%3)) "" "bananas") "3/1/3 _ bananas" + +Can be used with ranges and approximate too. and mixed with strings + +> , cmpQuantity "1-\"a few\" _ bananas" (Right (Quantity (Range (AmountRatio (1%1)) (AmountStr "a few")) "" "bananas")) +> , cmpQuantity "1/1/2-2 _ bananas" (Right (Quantity (Range (AmountRatio (3%2)) (AmountRatio (4%2))) "" "bananas")) +> , cmpQuantity "~1/1/2 _ bananas" (Right (Quantity (Approx (AmountRatio (3%2))) "" "bananas")) + +> ] + +Appendix +++++++++ + +> int = read <$> many1 digit +> parse = runParser recipe () "" + +Test helpers: + +> isLeft (Left _) = True +> isLeft _ = False + +> parseError = Left (newErrorUnknown (newPos "" 0 0)) +> cmpParser f str (Left _) = TestCase $ assertBool str $ isLeft $ runParser (f <* eof) () "" str +> cmpParser f str expected = str ~: runParser (f <* eof) () "" str ~?= expected + +> cmpParseSerialize f str expectp@(Left _) _ = [cmpParser f str expectp] +> cmpParseSerialize f str expectp@(Right expectpval) expects = [ +> cmpParser f str expectp +> , serialize expectpval ~?= expects] + +Wrap qstr test in AmountStr to aid serialization test + +> cmpQstr input expectp = cmpQstr3 input expectp input +> cmpQstr3 input (Left expect) _ = [cmpParser (AmountStr <$> qstr) input (Left expect)] +> cmpQstr3 input (Right expect) expects = cmpParseSerialize (AmountStr <$> qstr) input (Right (AmountStr expect)) expects + +> cmpQuantity a b = cmpQuantity3 a b a +> cmpQuantity3 = cmpParseSerialize quantity + +> cmpOperation a b = cmpOperation3 a b a +> cmpOperation3 = cmpParseSerialize operation + +> exactQuantity a b c = Right (Quantity (Exact a) b c) + +> test = [ +> "quantity" ~: testQuantityOverloaded ++ testQuantityApprox ++ testQuantityAmount ++ testQuantityRatio +> , "qstr" ~: testQstr +> , "oparg" ~: testOparg +> , "opterm" ~: testOpterm +> ] + +> isResult (Result _) = True +> isResult _ = False +> isReference (Reference _) = True +> isReference _ = False +> isAlternative (Alternative _) = True +> isAlternative _ = False +> isAnnotation (Annotation _) = True +> isAnnotation _ = False +> isAction (Action _) = True +> isAction _ = False + diff --git a/src/Codec/Pesto/Parse.lhs-boot b/src/Codec/Pesto/Parse.lhs-boot new file mode 100644 index 0000000..6a6dee9 --- /dev/null +++ b/src/Codec/Pesto/Parse.lhs-boot @@ -0,0 +1,20 @@ +> module Codec.Pesto.Parse where + +> data Operation = +> Annotation String +> | Ingredient Quantity +> | Tool Quantity +> | Action String +> | Reference Quantity +> | Result Object +> | Alternative Object +> data Quantity = Quantity Approximately Unit Object +> type Unit = String +> type Object = String +> data Approximately = +> Range Amount Amount +> | Approx Amount +> | Exact Amount +> data Amount = +> AmountRatio Rational +> | AmountStr String diff --git a/src/Codec/Pesto/Serialize.lhs b/src/Codec/Pesto/Serialize.lhs new file mode 100644 index 0000000..5b3007e --- /dev/null +++ b/src/Codec/Pesto/Serialize.lhs @@ -0,0 +1,69 @@ +Serializing +----------- + +.. class:: nodoc + +> module Codec.Pesto.Serialize (serialize) where +> import Data.Char (isSpace, toLower, isLetter) +> import Data.Ratio (numerator, denominator) +> +> import {-# SOURCE #-} Codec.Pesto.Parse + +> class Serializeable a where +> serialize :: a -> String + +.. class:: todo + +- Add instance for graph +- use :math:`\mathcal{O}(1)` string builder + + +Finally transform linear stream of operations into a string again: + +> instance Serializeable a => Serializeable [a] where +> serialize ops = unlines $ map serialize ops + +> instance Serializeable Operation where +> serialize (Annotation s) = quote '(' ')' s +> serialize (Ingredient q) = '+':serialize q +> serialize (Tool q) = '&':serialize q +> serialize (Action s) = quote '[' ']' s +> serialize (Reference q) = '*':serialize q +> serialize (Result s) = '>':serializeQstr s +> serialize (Alternative s) = '|':serializeQstr s + +> instance Serializeable Quantity where +> serialize (Quantity a b "") = serialize a ++ " " ++ serializeQstr b +> serialize (Quantity (Exact (AmountStr "")) "" c) = serializeQstr c +> serialize (Quantity a "" c) = serialize a ++ " _ " ++ serializeQstr c +> serialize (Quantity a b c) = serialize a ++ " " ++ serializeQstr b ++ " " ++ serializeQstr c + +> instance Serializeable Approximately where +> serialize (Range a b) = serialize a ++ "-" ++ serialize b +> serialize (Approx a) = '~':serialize a +> serialize (Exact a) = serialize a + +There are two special cases here, both for aesthetic reasons: + +1) If the denominator is one we can just skip printing it, because + :math:`\frac{2}{1} = 2` and +2) if the numerator is larger than the denominator use mixed fraction notation, + because :math:`\frac{7}{2} = 3+\frac{1}{2}` + +> instance Serializeable Amount where +> serialize (AmountRatio a) | denominator a == 1 = show (numerator a) +> serialize (AmountRatio a) | numerator a > denominator a = +> show full ++ "/" ++ show num ++ "/" ++ show denom +> where +> full = numerator a `div` denom +> num = numerator a - full * denom +> denom = denominator a +> serialize (AmountRatio a) = show (numerator a) ++ "/" ++ show (denominator a) +> serialize (AmountStr s) = serializeQstr s + +> serializeQstr "" = "_" +> serializeQstr s | (not . isLetter . head) s || hasSpaces s = quote '"' '"' s +> serializeQstr s = s +> hasSpaces = any isSpace +> quote start end s = [start] ++ concatMap (\c -> if c == end then ['\\', end] else [c]) s ++ [end] + |