Skip to content

Ordered Trees

Nathan Carter edited this page Mar 30, 2020 · 1 revision

Both the IT and the OT are ordered trees, meaning that the order of the children within a parent node is relevant. This section covers the data and relations we care about within ordered trees, and includes some of the specifics of how they will be implemented in the LDE, and thus what features are available for use in the rest of this design document.

Structures

Definition (Structure). We call each node in either ordered tree (the IT or the OT) a Structure. These are the basic level of meaning within the LDE. They can represent many different kinds of concepts, including (but not limited to) all of the examples given below.

First, recall that, as covered in the Design Overview, the Structure class will have two subclasses, Input Structure (IS) and Output Structure (OS). Furthermore, ISs are split into Input Expressions (IEs), which interpret to a specific meaning, and Input Modifiers (IMs), which do not. Thus all the examples below, being content a user might express, would be something that the client would express, on the user's behalf, using an IE.

  1. Mathematical concepts
    1. Inline data
      1. expressions, like $x^2+1=0$ or $\delta$
      2. Perhaps we will need constant or variable declarations; it depends on the formal system.
    2. Block-level data
      1. proofs and subproofs, including those parameterized by variables
      2. exercises or homework problems
      3. definitions, such as new language constructs or new rules of inference
      4. examples, which are environments in which anything you declare ends its scope at the end of the example and does not pollute the rest of the document
  2. Non-mathematical concepts
    1. Inline data
      1. text that labels something nearby, such as (*)
      2. the number at the beginning of a line in a numbered list
    2. Block-level data
      1. chapters or sections, which may encompass entire sets of definitions, theorems, etc., grouping them into a topic

We should have an IE subclass for each common mathematical concept that mathematics students should be learning to express and reason with, including those on the list above, but not limited to that list. Their behavior should be the same in the LDE as it is in mathematics. This helps prevent Lurch from adding unnecessarily to the already challenging learning curve in proof-based mathematics.

Because we use Structures to form trees, Structures can be nested, but only acyclically. Furthermore, the IT is composed entirely of ISs and the OT entirely of OSs, no mixing of the two. There may be other rules preventing arbitrary nesting of structures, such as not putting a proof inside a variable, but the tedious details of such rules are not our focus here.

The Structure class must provide a method for converting any tree of Structures into JSON form (serialization) and converting any such JSON form back into a tree of Structures (deserialization). This permits us, for example, to save Structure trees to a filesystem, or transmit them over network or thread boundaries.

Attributes of Structures

Every Structure contains a key-value dictionary called its attributes. In our implementation of the LDE, all keys must be strings and all values must be JSON data.

Any Structure can have an arbitrary number of "connections" to other Structures in the same tree. Each connection is a triple whose meaning is like an edge in a graph: it contains the source Structure, the destination Structure, and a (possibly empty) chunk of JSON data storing attributes of the connection itself. Because a Structure hierarchy can contain any number of these connections, including repetitions of the same triple, a Structure may be connected to many other Structures in many ways, even to the same Structure multiple times in the same way.

All of this connection data is implemented using JSON data in the attributes of the source and destination Structures. Although such JSON data may be capable of expressing connections between a node in the IT to a node outside the IT, we will ensure that our implementation polices such possibilities away, to ensure that our implementation matches the above theoretical reality.

Defintion (connection closure). We say that a tree (such as the IT or the OT) is connection-closed or has connection closure if no connection leads from any of its structures to any structure not in that tree, nor from a structure outside the tree to a structure inside the tree.

Our original definition of trees implies connection closure, but we make this definition to emphasize that our implementation must therefore ensure connection closure as well. This concept reappears on various pages of this wiki that discuss implementation.

Because of the existence of connections, we may sometimes refer to Structure hierarchies as "Relational Trees." One can view a relational tree as an undecorated tree coupled with zero or more multi-relations $R_1,\ldots,R_m$ on the set of nodes in the tree.

Every Structure will also have a boolean flag called "dirty," indicating whether the structure needs to be processed in some way (details on the Algorithms page). The IS base class will initialize dirty flags to false and the OS base class will initialize dirty flags to true.

In the IT, meaning is constructed recursively from children and dirty flags are used to indicate a need to recompute meaning. From these facts, we conclude that if an IS needs to have its meaning recomputed, then so does its parent, and its grandparent, and so on. The IS base class will therefore provide a method markDirty() to reify this conclusion: X.markDirty() sets the dirty flag of $X$ to be true, as well as that of every ancestor of $X$.

Relations within Structures

We define three relations among the nodes of the IT and the OT.

Definition (accessibility). We say that a Structure $X$ is accessible to a Structure $Y$, written $X\prec Y$, if some ancestor Structure of $Y$ (possibly $Y$ itself) is a sibling of $X$, but $X$ is the strictly earlier of the two siblings within their parent structure. This embodies the notion of which statements in a hierarchy of subproofs are accessible to be cited by later statements in the same proof.

Equivalently, we can speak of "scope" rather than accessibility. The scope of a structure $X$ is all later siblings of $X$ in the same parent, along with all their descendants. Thus $Y$ is in the scope of $X$ iff $X$ is accessible to $Y$. We can write $Y\succ X$ to mean that $Y$ is in the scope of $X$, which is synonymous with $X\prec Y$.

Definition (appearance). We say that a Structure $X$ appears before a Structure $Y$, written $X \ll Y$, if it would appear before it in a listing of the nodes in the tree produced by a depth-first traversal of that tree. This is the order in which entries in a filesystem are listed on screen in computer file browsers, as one expands folders and subfolders.

We can discuss both $\prec$ and $\ll$ in the IT and also in the OT. In both cases, $X\prec Y$ implies $X \ll Y$ but the converse does not always hold. Note that $\prec$ is a strict partial order and $\ll$ is a strict total order on the set of nodes in either tree.

Definition (connection). We say that a Structure $X$ is connected to a Structure $Y$, written $X\to Y$, if there is a connection (as introduced in the section on attributes, above for which $X$ is the source and $Y$ is the destination. If the connection has additional JSON data attached, we can summarize or suggest it in the notation, such as $X\xrightarrow{\text{type:reason}}Y$.

The Algorithms page in this wiki will stipulate how interpretation must interact with these relations.

Clone this wiki locally