summaryrefslogtreecommitdiff
path: root/src/lib/Codec/Pesto/Parse.lhs
diff options
context:
space:
mode:
authorLars-Dominik Braun <lars@6xq.net>2022-09-07 15:07:04 +0200
committerLars-Dominik Braun <lars@6xq.net>2022-09-07 15:07:04 +0200
commitb282af35ad4b0bb8d90e517f4b9ff03c22234090 (patch)
treed4b9834fe836e77d1253794c19ca0735e291716a /src/lib/Codec/Pesto/Parse.lhs
parent8571736188131acac9540814aeb4d4da99ab2454 (diff)
downloadpesto-b282af35ad4b0bb8d90e517f4b9ff03c22234090.tar.gz
pesto-b282af35ad4b0bb8d90e517f4b9ff03c22234090.tar.bz2
pesto-b282af35ad4b0bb8d90e517f4b9ff03c22234090.zip
Copy-edit specification
Diffstat (limited to 'src/lib/Codec/Pesto/Parse.lhs')
-rw-r--r--src/lib/Codec/Pesto/Parse.lhs115
1 files changed, 63 insertions, 52 deletions
diff --git a/src/lib/Codec/Pesto/Parse.lhs b/src/lib/Codec/Pesto/Parse.lhs
index ef9a908..762fff4 100644
--- a/src/lib/Codec/Pesto/Parse.lhs
+++ b/src/lib/Codec/Pesto/Parse.lhs
@@ -34,7 +34,7 @@ Language syntax
> import Codec.Pesto.Serialize (serialize)
Pesto parses `UTF-8 <https://tools.ietf.org/html/rfc3629>`_ encoded input data
-consisting of space-delimited instructions. Every character within the Unicode
+consisting of space-delimited token. Every character within the Unicode
whitespace class is considered a space.
.. _spaces1:
@@ -88,28 +88,30 @@ Here are examples for both:
> testOpterm = [cmpInstruction "(skinless\nboneless)" (Right (Annotation "skinless\nboneless"))
> , cmpInstruction "[stir together]" (Right (Action "stir together"))
-> , cmpInstruction "[stir\\]together]" (Right (Action "stir]together"))]
+> , cmpInstruction "[stir\\]together]" (Right (Action "stir]together"))
+> , cmpInstruction "[stir [together]" (Right (Action "stir [together"))]
The second one starts with one identifying character, ignores the following
-whitespace characters and then consumes an object or a quantity.
+whitespace characters, and then consumes a ``Quantity``.
> oparg :: Char -> Parsec String () Instruction -> Parsec String () Instruction
> oparg ident cont = char ident *> spaces *> cont
+>
> ingredient = oparg '+' (Ingredient <$> quantity)
> tool = oparg '&' (Tool <$> quantity)
> result = oparg '>' (Result <$> quantity)
> alternative = oparg '|' (Alternative <$> quantity)
> reference = oparg '*' (Reference <$> quantity)
-Additionally there are two special instructions. Directives are similar to the
-previous instructions, but consume a qstr.
+Additionally, there are two special instructions. Directives are similar to the
+previous instructions but consume a quoted string (``qstr``).
> directive = oparg '%' (Directive <$> qstr)
Unknown instructions are the fallthrough-case and accept anything. They must
not be discarded at this point. The point of accepting anything is to fail as
late as possible while processing input. This gives the parser a chance to
-print helpful mesages that provide additional aid to the user who can then fix
+print helpful messages that provide additional aid to the user, who can then fix
the problem.
> unknown = Unknown <$> many1 notspace
@@ -129,15 +131,16 @@ Below are examples for these instructions:
> , cmpInstruction3 "* \t\n 1 _ cheese"
> (Right (Reference (Quantity (Exact (AmountRatio (1%1))) "" "cheese")))
> "*1 _ cheese"
+> , cmpInstruction3 "!invalid" (Right (Unknown "!invalid")) "!invalid"
> ]
Qstr
++++
Before introducing quantities we need to have a look at qstr, which is used by
-them. A qstr, short for quoted string, can be – you guessed it already – a
-string enclosed in double quotes, a single word or the underscore character
-that represents the empty string.
+them. A qstr, short for quoted string, can be a string enclosed in double
+quotes, a single word or the underscore character that represents the
+empty string.
> qstr = try (betweenEscaped '"' '"')
> <|> word
@@ -157,11 +160,11 @@ not the empty string itself.
> , cmpQstr "_" (Right "")
> , cmpQstr "" parseError
-Any Unicode character with a General_Category major class L (i.e. a letter, see
+Any Unicode character with a General_Category major class L (i.e., a letter, see
`Unicode standard section 4.5
<http://www.unicode.org/versions/Unicode7.0.0/ch04.pdf>`_ for example) is
-accected as first character of a word. That includes german umlauts as well as
-greek or arabic script. Numbers, separators, punctuation and others are not
+accepted as first character of a word. That includes german umlauts as well as
+greek or arabic script. Numbers, separators, punctuation, and others are not
permitted.
> , cmpQstr "water" (Right "water")
@@ -187,7 +190,7 @@ numbers, …
> , cmpQstr "sour\tcream" parseError
> , cmpQstr "white\nwine" parseError
-If a string contains spaces or starts with a special character it must be
+If a string contains spaces or starts with a special character, it must be
enclosed in double quotes.
> , cmpQstr3 "\"salt\"" (Right "salt") "salt"
@@ -196,7 +199,7 @@ enclosed in double quotes.
> , cmpQstr "\"1sugar\"" (Right "1sugar")
> , cmpQstr "\"chicken\tbreast\nmeat\"" (Right "chicken\tbreast\nmeat")
-Double quotes within a string can be quoted by prepending a backslash. However
+Doublequotes within a string can be quoted by prepending a backslash. However,
the usual escape codes like \\n, \\t, … will *not* be expanded.
> , cmpQstr "\"vine\"gar\"" parseError
@@ -204,21 +207,20 @@ the usual escape codes like \\n, \\t, … will *not* be expanded.
> , cmpQstr "\"oli\\ve oil\"" (Right "oli\\ve oil")
> , cmpQstr "\"oli\\\\\"ve oil\"" (Right "oli\\\"ve oil")
> , cmpQstr3 "\"sal\\tmon\"" (Right "sal\\tmon") "sal\\tmon"
-> ]
+> ]
Quantity
++++++++
-The instructions Ingredient, Tool and Reference accept a *quantity*, that is a
-triple of Approximately, Unit and Object as parameter.
+A ``Quantity`` is a triple of ``Approximately``, ``Unit`` and ``Object`` as parameter.
> data Quantity = Quantity Approximately Unit Object deriving (Show, Eq)
-The syntactic construct is overloaded and accepts one to three arguments. If
-just one is given it is assumed to be the Object and Approximately and Unit are
-empty. Two arguments set Approximately and Unit, which is convenient when the
-unit implies the object (minutes usually refer to the object time, for
-example).
+The syntactic construct is overloaded and accepts one to three
+arguments. If just one is given, it is assumed to be the ``Object``
+and ``Approximately`` and ``Unit`` are empty. Two arguments set
+``Approximately`` and ``Unit``, which is convenient when the unit implies
+the object (minutes usually refer to the object time, for example).
> quantity = try quantityA <|> quantityB
@@ -243,13 +245,13 @@ The first two are equivalent to
> , cmpQuantity3 "_ _ oven" (exactQuantity (AmountStr "") "" "oven") "oven"
> , cmpQuantity3 "10 min _" (exactQuantity (AmountRatio (10%1)) "min" "") "10 min"
-Missing units must not be ommited. The version with underscore should be prefered.
+Missing units must not be omitted. The version with underscore should be preferred.
-> , cmpQuantity3 "1 \"\" meal" (exactQuantity (AmountRatio (1%1)) "" "meal") "1 _ meal"
-> , cmpQuantity "1 _ meal" (exactQuantity (AmountRatio (1%1)) "" "meal")
-> ]
+> , cmpQuantity3 "1 \"\" meal" (exactQuantity (AmountRatio (1%1)) "" "meal") "1 _ meal"
+> , cmpQuantity "1 _ meal" (exactQuantity (AmountRatio (1%1)) "" "meal")
+> ]
-Units and objects are just strings. However units should be limited to
+Units and objects are just strings. However, units should be limited to
`well-known metric units <#well-known-units>`_.
> type Unit = String
@@ -258,8 +260,8 @@ Units and objects are just strings. However units should be limited to
> type Object = String
> object = qstr
-Approximately is a wrapper for ranges, that is two amounts separated by a dash,
-approximate amounts, prepended with a tilde and exact amounts without modifier.
+``Approximately`` is a wrapper for ranges, that is, two amounts separated by a dash,
+approximate amounts, prepended with a tilde, and exact amounts without a modifier.
> data Approximately =
> Range Amount Amount
@@ -279,13 +281,12 @@ approximate amounts, prepended with a tilde and exact amounts without modifier.
> , cmpQuantity "1 -2 _ bananas" parseError
> , cmpQuantity "~2 _ bananas" (Right (Quantity (Approx (AmountRatio (2%1))) "" "bananas"))
> , cmpQuantity "~ 2 _ bananas" parseError
-
-> ]
+> ]
Amounts are limited to rational numbers and strings. There are no real numbers
-by design and implementations should avoid representing rational numbers as
-IEEE float. They are not required and introduce ugly corner cases when
-rounding while converting units for example.
+by design, and implementations should avoid representing rational numbers as
+floating point numbers. They are not required and introduce ugly corner cases when
+rounding while converting units, for example.
> data Amount =
> AmountRatio Rational
@@ -300,9 +301,9 @@ rounding while converting units for example.
> , cmpQuantity "~\"the stars in your eyes\" _ bananas" (Right (Quantity (Approx (AmountStr "the stars in your eyes")) "" "bananas"))
> ]
-Rational numbers can be an integral, numerator and denominator, each separated
+Rational numbers can be an integral, numerator, and denominator, each separated
by a forward slash, just the numerator and denominator, again separated by a
-forward slash or just a numerator with the default denominator 1 (i.e. ordinary
+forward slash, or just a numerator with the default denominator 1 (i.e., ordinary
integral number).
> ratio = let toRatio i num denom = AmountRatio ((i*denom+num)%denom) in
@@ -310,36 +311,43 @@ integral number).
> <|> try (toRatio <$> return 0 <*> int <*> (char '/' *> int))
> <|> try (toRatio <$> return 0 <*> int <*> return 1)
-These are all equal.
+The following representations are all equal with the first one being
+the preferred one:
> testQuantityRatio = [
> cmpQuantity "3 _ bananas" (exactQuantity (AmountRatio (3%1)) "" "bananas")
-> , cmpQuantity3 "3/1 _ bananas" (exactQuantity (AmountRatio (3%1)) "" "bananas") "3 _ bananas"
-> , cmpQuantity3 "3/0/1 _ bananas" (exactQuantity (AmountRatio (3%1)) "" "bananas") "3 _ bananas"
+> , cmpQuantity3 "3/1 _ bananas" (exactQuantity (AmountRatio (3%1)) "" "bananas")
+> "3 _ bananas"
+> , cmpQuantity3 "3/0/1 _ bananas" (exactQuantity (AmountRatio (3%1)) "" "bananas")
+> "3 _ bananas"
-XXtwo is num and denom
+Two numbers are numerator and denominator:
> , cmpQuantity "3/5 _ bananas" (exactQuantity (AmountRatio (3%5)) "" "bananas")
-three is int, num and denom
+Three numbers add an integral part:
> , cmpQuantity "3/5/7 _ bananas" (exactQuantity (AmountRatio ((3*7+5)%7)) "" "bananas")
+> , cmpQuantity3 "10/3 _ bananas" (exactQuantity (AmountRatio (10%3)) "" "bananas")
+> "3/1/3 _ bananas"
-> , cmpQuantity3 "10/3 _ bananas" (exactQuantity (AmountRatio (10%3)) "" "bananas") "3/1/3 _ bananas"
-
-Can be used with ranges and approximate too. and mixed with strings
-
-> , cmpQuantity "1-\"a few\" _ bananas" (Right (Quantity (Range (AmountRatio (1%1)) (AmountStr "a few")) "" "bananas"))
-> , cmpQuantity "1/1/2-2 _ bananas" (Right (Quantity (Range (AmountRatio (3%2)) (AmountRatio (4%2))) "" "bananas"))
-> , cmpQuantity "~1/1/2 _ bananas" (Right (Quantity (Approx (AmountRatio (3%2))) "" "bananas"))
+Rational numbers can be used in ranges and mixed with strings too.
+> , cmpQuantity "1-\"a few\" _ bananas" (Right (Quantity
+> (Range (AmountRatio (1%1)) (AmountStr "a few")) "" "bananas"))
+> , cmpQuantity "1/1/2-2 _ bananas" (Right (Quantity
+> (Range (AmountRatio (3%2)) (AmountRatio (4%2))) "" "bananas"))
+> , cmpQuantity "~1/1/2 _ bananas" (Right (Quantity
+> (Approx (AmountRatio (3%2))) "" "bananas"))
> ]
Appendix
++++++++
-> int = read <$> many1 digit
+Parser main entry point.
+
> parse = runParser stream () ""
+> int = read <$> many1 digit
Test helpers:
@@ -378,9 +386,12 @@ Wrap qstr test in AmountStr to aid serialization test
> strQuantity = Quantity (Exact (AmountStr "")) ""
> test = [
-> "quantity" ~: testQuantityOverloaded ++ testQuantityApprox ++ testQuantityAmount ++ testQuantityRatio
-> , "qstr" ~: testQstr
-> , "oparg" ~: testOparg
+> "quantity" ~: testQuantityOverloaded
+> ++ testQuantityApprox
+> ++ testQuantityAmount
+> ++ testQuantityRatio
+> , "qstr" ~: testQstr
+> , "oparg" ~: testOparg
> , "opterm" ~: testOpterm
> ]