From 93f4989835740ccd8ce4e622f899accf08fabe89 Mon Sep 17 00:00:00 2001 From: Will Kahn-Greene Date: Tue, 25 Jul 2017 21:33:39 -0400 Subject: [PATCH 1/6] Throw some documentation down for design-by-docs --- docs/api.rst | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+) create mode 100644 docs/api.rst diff --git a/docs/api.rst b/docs/api.rst new file mode 100644 index 0000000..558a1d1 --- /dev/null +++ b/docs/api.rst @@ -0,0 +1,87 @@ +=========== +Treelib API +=========== + +Library for manipulating trees in Python made up of dicts and lists. + + +.. py:func:: tree_get(tree, key, default=None) + + Given a tree consisting of mappings (like dict) and indexable sequences (like + list), returns the value specified by the key. + + The key is a period-delimited sequence of edges to traverse. If one of the + ediges doesn't exist, then the default is returned immediately. + + Examples: + + >>> tree_get({'a': 1}, 'a') + 1 + >>> tree_get({'a': 1}, 'b') + None + >>> tree_get({'a': {'b': 2}}, 'a.b') + 2 + >>> tree_get({'a': {'b': 2}}, 'a.b.c', default=55) + 55 + + This supports sequences, too: + + >>> tree_get({'a': [1, 2, 3]}, 'a.1') + 2 + >>> tree_get({'a': {'1': 2}}, 'a.1') + 2 + + Both dict and list support getitem notation, so the ``1`` works fine. + + +.. py:func:: tree_set(tree, key, value) + + Given a tree consisting of mappings (like dict) and indexable sequences that + support getitem notation, sets the key to the value. + + The key is a period-delimited sequence of edges to traverse. If one of the + ediges doesn't exist, then the edge is created using these rules: + + 1. if the next edge is an integer, then it creates a list + 2. if the next edge is not an integer, then it creates a dict + + This returns the tree which is mutated in place. + + >>> tree_set({}, 'a', value=5) + {'a': 5} + >>> tree_set({}, 'a.b.c', value=5) + {'a': {'b': {'c': 5}}} + + This will not create new list indexes: + + >>> tree_set({}, 'a.1', value=5) + IndexError('list index out of range') + + This is the same error you'd get if you tried to access an index that doesn't + exist in a list. + + + .. Note:: + + Other ideas + + Add a ``create_indexes=True`` argument that'll let it fill in missing + indexes with ``None``: + + >>> tree_set({}, 'a.1.b', value=5, create_indexes=True) + {'a': [None, {'b': 5}]} + + +.. py:func:: tree_flatten(tree) + + FIXME + + +.. py:func:: tree_validate(tree, fun) + + FIXME + + +.. py:func:: tree_traverse(tree, fun) + + FIXME From c38280a27ba376c5b3dac17ba63f5b40160f2cae Mon Sep 17 00:00:00 2001 From: Will Kahn-Greene Date: Tue, 25 Jul 2017 21:46:46 -0400 Subject: [PATCH 2/6] Add some of Lonnen's thoughts --- docs/api.rst | 31 +++++++++++++++++-------------- 1 file changed, 17 insertions(+), 14 deletions(-) diff --git a/docs/api.rst b/docs/api.rst index 558a1d1..026374c 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -33,6 +33,11 @@ Library for manipulating trees in Python made up of dicts and lists. Both dict and list support getitem notation, so the ``1`` works fine. + Some things to know about ``tree_get()``: + + 1. It doesn't alter the tree at all. + 2. Once it hits an edge that's missing, it returns ``None`` or the default. + .. py:func:: tree_set(tree, key, value) @@ -52,7 +57,9 @@ Library for manipulating trees in Python made up of dicts and lists. >>> tree_set({}, 'a.b.c', value=5) {'a': {'b': {'c': 5}}} - This will not create new list indexes: + While ``tree_set`` does create new dicts and lists if they're missing, it + will not create new list indexes. Instead, it'll raise an ``IndexError``. For + example: >>> tree_set({}, 'a.1', value=5) IndexError('list index out of range') @@ -61,23 +68,19 @@ Library for manipulating trees in Python made up of dicts and lists. exist in a list. - .. Note:: - - Other ideas - - Add a ``create_indexes=True`` argument that'll let it fill in missing - indexes with ``None``: - - >>> tree_set({}, 'a.1.b', value=5, create_indexes=True) - {'a': [None, {'b': 5}]} - - .. py:func:: tree_flatten(tree) - FIXME + Flattens a tree into a dict with keys of paths. + + >>> tree_flatten({'a': 1}) + {'a': 1} + >>> tree_flatten({'a': {'b': 1, 'c': 2}}) + {'a.b': 1, 'a.c': 2} + >>> tree_flatten({'a': [{'b': 1}, {'c': 2}]}) + {'a.0.b': 1, 'a.1.c': 2} -.. py:func:: tree_validate(tree, fun) +.. py:func:: tree_validate(tree, schema) FIXME From 73e2f34203e2b7d28bb92cbebd6f20d8d865aa83 Mon Sep 17 00:00:00 2001 From: Will Kahn-Greene Date: Fri, 28 Jul 2017 12:54:57 -0400 Subject: [PATCH 3/6] Lots of changes * adjust syntax for paths to distinguish between keys and indices * rework tree_set to try to make it straight-forward but flexible for the myriad of scenarios when edges are missing --- docs/api.rst | 177 +++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 144 insertions(+), 33 deletions(-) diff --git a/docs/api.rst b/docs/api.rst index 026374c..7a2815b 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -5,13 +5,66 @@ Treelib API Library for manipulating trees in Python made up of dicts and lists. -.. py:func:: tree_get(tree, key, default=None) +Paths +===== - Given a tree consisting of mappings (like dict) and indexable sequences (like - list), returns the value specified by the key. +Paths are period-delimited set of edges to take. Edges can be: - The key is a period-delimited sequence of edges to traverse. If one of the - ediges doesn't exist, then the default is returned immediately. +1. a key (for a dict) +2. an index (for a list) + +Example paths:: + + a + a.[1].foo_bar.Bar + a.b.[-1].Bar + + +Key +--- + +Keys are identifiers that: + +1. are made of ascii alphanumeric characters, hyphens, and underscores +2. are at least one character long + +For example, these are all valid keys:: + + a + foo + FooBar + Foo-Bar + foo_bar + + +Index +----- + +Indexes are 0-based list indexes. They are: + +1. integers +2. wrapped in ``[`` and ``]`` +3. can be negative + +For example, these are all valid indexes:: + + [0] + [1] + [-50] + + +API +=== + +.. py:func:: tree_get(tree, path, default=None) + + Given a tree consisting of dicts and lists, returns the value specified by + the path. + + Some things to know about ``tree_get()``: + + 1. It doesn't alter the tree. + 2. Once it hits an edge that's missing, it returns the default. Examples: @@ -23,49 +76,97 @@ Library for manipulating trees in Python made up of dicts and lists. 2 >>> tree_get({'a': {'b': 2}}, 'a.b.c', default=55) 55 - - This supports sequences, too: - - >>> tree_get({'a': [1, 2, 3]}, 'a.1') - 2 >>> tree_get({'a': {'1': 2}}, 'a.1') 2 + >>> tree_get({'a': [1, 2, 3]}, 'a.[1]') + 2 + >>> tree_get({'a': [{}, {'b': 'foo'}]}, 'a.[1].b') + 'foo' - Both dict and list support getitem notation, so the ``1`` works fine. - Some things to know about ``tree_get()``: +.. py:func:: tree_set(tree, path, value, mutate=True, create_missing=False) - 1. It doesn't alter the tree at all. - 2. Once it hits an edge that's missing, it returns ``None`` or the default. + Given a tree consisting of dicts and lists, sets the item specified by path + to the specified value. + If one of the edges doesn't exist, then this raises either a ``KeyError`` + for dicts or a ``IndexError`` for lists. -.. py:func:: tree_set(tree, key, value) + :arg boolean mutate: If ``mutate`` is ``True`` (the default), then this + changes the tree in place and returns the mutated tree. - Given a tree consisting of mappings (like dict) and indexable sequences that - support getitem notation, sets the key to the value. + If ``mutate`` is ``False``, then this does a deepcopy of the tree, + changes the copy, and returns the copy. This is expensive. - The key is a period-delimited sequence of edges to traverse. If one of the - ediges doesn't exist, then the edge is created using these rules: + :arg boolean create_missing: If ``create_missing`` is ``False`` (the default), + then this will raise a ``KeyError`` for failed dict keys and + ``IndexError`` for failed list indexes. - 1. if the next edge is an integer, then it creates a list - 2. if the next edge is not an integer, then it creates a dict + If ``create_missing`` is ``True``, and this isn't + the last item in the path, then this will create the intermediary + dict/list. - This returns the tree which is mutated in place. + If the next edge is a key, it'll create a dict. If the next edge is an + index, then it'll create a list filling in ``None`` for the required + indices. - >>> tree_set({}, 'a', value=5) + Here are some examples. + + This sets ``a`` to 5. This isn't affected by ``create_missing``. + + >>> tree_set({}, 'a', value=5, create_missing=True) + {'a': 5} + >>> tree_set({}, 'a', value=5, create_missing=False) + {'a': 5} + + This tries to traverse ``a``, but it doesn't exist and it's not the last + edge in the path. The next edge is ``b``, which is a key, so it first sets + ``a`` to an empty dict, then proceeds. + + >>> tree_set({}, 'a.b', value=5, create_missing=True) + {'a': {'b': 5}} + + This tries to traverse ``a``, but it doesn't exist and it's not the last + edge in the path. The next edge is ``[2]``, which is an index, so it first + sets ``a`` to a list of 3 ``None`` values, then proceeds. + + >>> tree_set({}, 'a.[2]', value=5, create_missing=True) + {'a': [None, None, 5]} + + This is similar, but with a negative index. + + >>> tree_set({}, 'a.[-1]', value=5, create_missing=True) + {'a': [5]} + + This creates missing indices in an existing list. + + >>> tree_set({'a': []}, 'a.[2]', value=5, create_missing=True) + {'a': [None, None, 5]} + + + Examples: + + These don't mutate the tree: + + >>> tree = {'a': {'b': {'c': 1}}} + >>> tree_set(tree, 'a', value=5, mutate=False) {'a': 5} - >>> tree_set({}, 'a.b.c', value=5) - {'a': {'b': {'c': 5}}} + >>> tree_set(tree, 'a.b.c', value=[], mutate=False) + {'a': {'b': {'c': []}}} - While ``tree_set`` does create new dicts and lists if they're missing, it - will not create new list indexes. Instead, it'll raise an ``IndexError``. For - example: + These raise errors if an edge is missing: - >>> tree_set({}, 'a.1', value=5) - IndexError('list index out of range') + >>> tree_set({}, 'a.b.c', value=5) + KeyError ... + >>> tree_set({}, 'a.[1].b', value=5) + IndexError ... + + These create missing edges and indexes: - This is the same error you'd get if you tried to access an index that doesn't - exist in a list. + >>> tree_set({}, 'a.b.c', value=5, create_missing=True) + {'a': {'b': {'c': 5}}} + >>> tree_set({}, 'a.[1].b', value=5, create_missing=True) + {'a': [None, {'b': 5}]} .. py:func:: tree_flatten(tree) @@ -77,7 +178,17 @@ Library for manipulating trees in Python made up of dicts and lists. >>> tree_flatten({'a': {'b': 1, 'c': 2}}) {'a.b': 1, 'a.c': 2} >>> tree_flatten({'a': [{'b': 1}, {'c': 2}]}) - {'a.0.b': 1, 'a.1.c': 2} + {'a.[0].b': 1, 'a.[1].c': 2} + + .. Note:: + + At this point, a flattened tree can't be used using ``tree_get`` and + ``tree_set``. + + +.. py:func:: tree_setdefault(tree, default_tree) + + FIXME .. py:func:: tree_validate(tree, schema) From 87e2fa30a324bbd3703c3826df228b3bfe4eaf57 Mon Sep 17 00:00:00 2001 From: Will Kahn-Greene Date: Fri, 28 Jul 2017 13:05:16 -0400 Subject: [PATCH 4/6] Add a mission --- docs/api.rst | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/docs/api.rst b/docs/api.rst index 7a2815b..61577df 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -5,6 +5,47 @@ Treelib API Library for manipulating trees in Python made up of dicts and lists. +Goals +===== + +The primary goal of this library is to make it less unwieldy to manipulate trees +made up of Python dicts and lists. + +For example, say we want to get a value deep in the tree. we could do this:: + + value = tree['a']['b']['c'] + + +That'll throw a ``KeyError`` if any of those bits are missing. So you could +handle that:: + + try: + value = tree['a']['b']['c'] + except KeyError: + value = None + + +Alternatively, you could do this:: + + value = tree.get('a', {}).get('b', {}).get('c': None) + + +These work, but both are unwieldy especially if you're doing this a lot. + +Similarly, setting things deep is also unenthusing:: + + tree['a']['b']['c'] = 5 + + +The safer form is this:: + + tree.setdefault('a', {}).setdefault('b', {})['c'] = 5 + + +This library aims to make sane use cases for tree manipulation easier to read +and think about. + + Paths ===== From 1722143004f89a4151911be4e2f08af65e01608e Mon Sep 17 00:00:00 2001 From: Will Kahn-Greene Date: Fri, 28 Jul 2017 13:34:13 -0400 Subject: [PATCH 5/6] Add research and inspirations --- docs/api.rst | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) diff --git a/docs/api.rst b/docs/api.rst index 61577df..3c8deb0 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -61,6 +61,10 @@ Example paths:: a.b.[-1].Bar +One nice thing about paths is that they're just strings, so you can compose them +using string operations. + + Key --- @@ -240,3 +244,43 @@ API .. py:func:: tree_traverse(tree, fun) FIXME + + +Research and Inspirations +========================= + +Python ``defaultdict`` +---------------------- + +Python has a defaultdict + +https://docs.python.org/3/library/collections.html#defaultdict-objects + +This doens't handle lists and dicts well, though. + +We'd have to either create the original data structure as a defaultdict, or +convert it to one. + +If you try to get something deep from a defaultdict, it mutates the +structure. + +It doesn't easily support composable paths. + + +jq processor +------------ + +jq has interesting filter syntax. + +https://stedolan.github.io/jq/manual/#Basicfilters + + +Creating a new subclass of Python ``dict`` +------------------------------------------ + +We could do that and add ``get_path`` and ``set_path``, but I wonder if we can +get the utility we want without having to box/unbox data. + +If we're just working with dicts and lists and standard Python things, then +``json.dumps`` and other things just work without us having to do anything about +them. From e0fcfcafacafcba2474008f8078c53f597af10f6 Mon Sep 17 00:00:00 2001 From: Will Kahn-Greene Date: Fri, 11 Aug 2017 08:28:33 -0400 Subject: [PATCH 6/6] Tweak some wording for clarity --- docs/api.rst | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/docs/api.rst b/docs/api.rst index 3c8deb0..da28635 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -49,7 +49,7 @@ and think about. Paths ===== -Paths are period-delimited set of edges to take. Edges can be: +A path is a string specifying a period-delimited list of edges. Edges can be: 1. a key (for a dict) 2. an index (for a list) @@ -61,17 +61,18 @@ Example paths:: a.b.[-1].Bar -One nice thing about paths is that they're just strings, so you can compose them -using string operations. +Paths can be composed using string operations since they're just strings. + +FIXME(willkg): Add diagram showing a tree with edges specified by a path. Key --- -Keys are identifiers that: +Keys are identifiers that are: -1. are made of ascii alphanumeric characters, hyphens, and underscores -2. are at least one character long +1. composed entirely of ascii alphanumeric characters, hyphens, and underscores +2. at least one character long For example, these are all valid keys:: @@ -85,7 +86,7 @@ For example, these are all valid keys:: Index ----- -Indexes are 0-based list indexes. They are: +Indexes indicate a 0-based list index. They are: 1. integers 2. wrapped in ``[`` and ``]``