From 1deb379e40fc613415c9f7f2b7ffccc19cf74cd5 Mon Sep 17 00:00:00 2001 From: John Bogovic Date: Thu, 16 Nov 2023 11:40:30 -0500 Subject: [PATCH 01/18] rough start to array order --- ArrayOrder.md | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 76 insertions(+) create mode 100644 ArrayOrder.md diff --git a/ArrayOrder.md b/ArrayOrder.md new file mode 100644 index 0000000..832d52c --- /dev/null +++ b/ArrayOrder.md @@ -0,0 +1,76 @@ +## Arrays, coordinates, and indexes + +Zarr stores multi-dimensional arrays into regularly sized chunks. +Chunks are themselves multi-dimensional arrays of a smaller size than +the complete multidimensional array and are stored as a 1D array of +values, called a "flattened" array. + +Indexing into a multi-dimensional array is done with an ordered tuple, +each element of which indexes into one of the arrays dimensions. +Elements of this tuple are called "coordinates." For example, the tuple +`(i,j,k)` indexes a three-dimensional array, and `i`, `j`, and `k` are +its coordinates. We will call `i` the left-most index, and `k` the +right-most index (avoiding "first" and "last"). + +The only valid cordinates for arrays are the non-negative integers. + +The following will describe conventions for mapping multi-dimensional +indexes (tuples of integers) to the scalar integer index of the +flattened 1D array. + +### Reshaping and stride + +One can think of reshaping a 1D are a recursive process of +grouping a number of adjacent elements + +A 1D array is reshaped to an n-dimensional array by grouping a number +adjacent elements belonging to the same dimension. + +* **Define:** the "size" of a dimension is the number of grouped elements. + +The stride of the next dimension is the size of the previous dimension. + +* **Define:** the stride of a dimension is the (positive) step in the + flat array that corresponds to the adjacent element (step of one) + along that dimension. + +* **Define:** the "first" dimension is the dimension with a stride of 1. +* **Define:** the "last" dimension is the dimension with the largest stride. + +#### example + +Suppose we have this flat array: + +`0, 1, 2, 3, 4, 5` + +and two dimensions having strides 1 and 3. + + +`(0, 1, 2), (3, 4, 5)` +`[(0, 1, 2), (3, 4, 5)]` + + +### row- / column- major + +The terms row- and column-major derive for the storage of matrices - +often represented as a 2D array of numbers. Horizontal groupings of +these numbers are called "rows" and vertical groupings are called +"columns." In mathematics, the entries of a matrix $A$ are denoted +$a_{ij}$. Where rows of the matrix are indexed by $i$ the "first", or +"left" index, and columns of the matrix are indexed by $j$, the +"second", or "right" index. + +Discussion of row- and column- major depends on first agreeing which +index (left or right) refers to rows vs columns, and for matrices in +mathematics, the left indexs **always** (in this author's experience) +refers to rows. + +### C- and F-order + +* **Define:** C-order reshapes flat arrays into multiple dimensions such + that the **rightmost** index has stride 1. +* **Define:** F-order reshapes flat arrays into multiple dimensions such + that the **leftmost** index has stride 1. + +These terms come from conventions for storing arrays in the C and +Fortran programming languages. From b8fbf89f70813f2943449c7cb7ddd195ca09ce4c Mon Sep 17 00:00:00 2001 From: John Bogovic Date: Thu, 16 Nov 2023 15:35:13 -0500 Subject: [PATCH 02/18] array ordering progress * matrix convention and example * image convention --- ArrayOrder.md | 91 ++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 76 insertions(+), 15 deletions(-) diff --git a/ArrayOrder.md b/ArrayOrder.md index 832d52c..009d979 100644 --- a/ArrayOrder.md +++ b/ArrayOrder.md @@ -43,27 +43,13 @@ Suppose we have this flat array: `0, 1, 2, 3, 4, 5` -and two dimensions having strides 1 and 3. +and two dimensions having strides 1 and 3, equivalently having dimensions 3 and 2. `(0, 1, 2), (3, 4, 5)` `[(0, 1, 2), (3, 4, 5)]` -### row- / column- major - -The terms row- and column-major derive for the storage of matrices - -often represented as a 2D array of numbers. Horizontal groupings of -these numbers are called "rows" and vertical groupings are called -"columns." In mathematics, the entries of a matrix $A$ are denoted -$a_{ij}$. Where rows of the matrix are indexed by $i$ the "first", or -"left" index, and columns of the matrix are indexed by $j$, the -"second", or "right" index. - -Discussion of row- and column- major depends on first agreeing which -index (left or right) refers to rows vs columns, and for matrices in -mathematics, the left indexs **always** (in this author's experience) -refers to rows. ### C- and F-order @@ -74,3 +60,78 @@ refers to rows. These terms come from conventions for storing arrays in the C and Fortran programming languages. + +### array size + +When discussing an array that is stored in C-order, dimension size will +be descibed using a list if sizes per dimension. For example: `[ 3, 5, 7 ].` +In this example, the left-most dimension has size `3`, the right-most +dimension has size `7`. + +As always the *first* dimension will have stride 1. Because we're using +C-order, is the right-most index. As a result, the *second* dimension +will have stride `7`, and the *third* dimension will have stride `7*5 = +35`. + +Consider again an array of size `[ 3, 5, 7 ]`, but using F-order. +Again, the left-most dimension has size `3`, the right-most dimension +has size `7`. + +As always the *first* dimension will have stride 1. However, now using +F-order, the *second* dimension will have stride `3`, and the *second* +dimension will have stride `3*5=15`. + +### row- / column- major + +Matrices are often represented as a 2D array of numbers. Horizontal +groupings of these numbers are called "rows" and vertical groupings are +called "columns." In mathematics, the entries of a matrix $A$ are +denoted $a_{ij}$. Where rows of the matrix are indexed by $i$ the +"first", or "left" index, and columns of the matrix are indexed by $j$, +the "second", or "right" index. + +* **Universal Matrix Convention**: Left indexes refer to rows, + right indexes refer to columns. + +The terms row- and column-major derive for the storage of matrices. +Defining these terms first depends on first agreeing which index (left +or right) refers to rows vs columns, and for matrices in mathematics. + +* **Define:** Arrays storing matrices in "row-major" give columns stride 1. +* **Define:** Arrays storing matrices in "column-major" give rows stride 1. + +* **Consequence:** Given matrix storage conventions C-order storage is + equivalent to "row-major". +* **Consequence:** Given matrix storage conventions F-order storage is + equivalent to "column-major". + +#### example + +As a result of the *Universal Matrix Convention* the size of a matrix +with `2` rows and `3` columns is `[2, 3]` for both C- and F-orderings. +Consider: + +``` + column 0 column 1 column 2 + row 0 [ 0 1 2 ] + row 1 [ 3 4 5 ] +``` + +* The flat C-ordered array will be: `[0, 1, 2, 3, 4, 5]` +* The flat F-ordered array will be: `[0, 3, 1, 4, 2, 5] + + +### image analysis + +Most formats for storing image files store data such that the +"horizontal axis" / rows have a smaller stride than the "vertical axis" +/ columns. (Note: while rows have smaller stride than columns, it is +common for rows NOT to have stride 1, for example when using +"interleaved" color components, the "color" dimension often will have a +stride of 1.) + + +### refs + +1) [nrrd axis ordering](https://teem.sourceforge.net/nrrd/format.html#general.4) +2) [n5 ordering discussion](https://github.com/saalfeldlab/n5/issues/31) From 135bfa53f3438b0f24074a21d7757126859290dc Mon Sep 17 00:00:00 2001 From: John Bogovic Date: Thu, 16 Nov 2023 15:51:12 -0500 Subject: [PATCH 03/18] multi-dim arr indexing * change top level title * add conclusion * typo fixes --- ArrayOrder.md | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/ArrayOrder.md b/ArrayOrder.md index 009d979..0147f42 100644 --- a/ArrayOrder.md +++ b/ArrayOrder.md @@ -1,4 +1,4 @@ -## Arrays, coordinates, and indexes +## Multi-dimensional array indexing Zarr stores multi-dimensional arrays into regularly sized chunks. Chunks are themselves multi-dimensional arrays of a smaller size than @@ -81,7 +81,7 @@ As always the *first* dimension will have stride 1. However, now using F-order, the *second* dimension will have stride `3`, and the *second* dimension will have stride `3*5=15`. -### row- / column- major +### row- and column-major Matrices are often represented as a 2D array of numbers. Horizontal groupings of these numbers are called "rows" and vertical groupings are @@ -118,7 +118,7 @@ Consider: ``` * The flat C-ordered array will be: `[0, 1, 2, 3, 4, 5]` -* The flat F-ordered array will be: `[0, 3, 1, 4, 2, 5] +* The flat F-ordered array will be: `[0, 3, 1, 4, 2, 5]` ### image analysis @@ -130,6 +130,17 @@ common for rows NOT to have stride 1, for example when using "interleaved" color components, the "color" dimension often will have a stride of 1.) +## Conclusions + +If a convention is such that the meaning / interpretation of the index +in a particular position (left / rightmost) is strong, then C- and F- +order will agree on the ordering of an array's dimensions, but will +store the arrays differently when flattened. + +If a convention is such that the meaning / interpretation of a +particular stride (fastest / slowest) is strong, then C- and F- order +will dis-agree on the ordering of an array's dimensions, but will store the +arrays the same way when flattened. ### refs From 7d0eb84b807d7131fc8deb6b964a337543523a41 Mon Sep 17 00:00:00 2001 From: John Bogovic Date: Fri, 17 Nov 2023 14:30:40 -0500 Subject: [PATCH 04/18] typo fix, cartesian coords --- ArrayOrder.md | 41 +++++++++++++++++++++++++++++++++-------- 1 file changed, 33 insertions(+), 8 deletions(-) diff --git a/ArrayOrder.md b/ArrayOrder.md index 0147f42..a8c3ab9 100644 --- a/ArrayOrder.md +++ b/ArrayOrder.md @@ -12,16 +12,16 @@ Elements of this tuple are called "coordinates." For example, the tuple its coordinates. We will call `i` the left-most index, and `k` the right-most index (avoiding "first" and "last"). -The only valid cordinates for arrays are the non-negative integers. +The only valid coordinates for arrays are the non-negative integers. The following will describe conventions for mapping multi-dimensional -indexes (tuples of integers) to the scalar integer index of the +indexes (ordered tuples of integers) to the scalar integer index of the flattened 1D array. ### Reshaping and stride -One can think of reshaping a 1D are a recursive process of -grouping a number of adjacent elements +One can think of reshaping a 1D as a recursive process of grouping a +number of adjacent elements. A 1D array is reshaped to an n-dimensional array by grouping a number adjacent elements belonging to the same dimension. @@ -45,12 +45,10 @@ Suppose we have this flat array: and two dimensions having strides 1 and 3, equivalently having dimensions 3 and 2. - `(0, 1, 2), (3, 4, 5)` `[(0, 1, 2), (3, 4, 5)]` - ### C- and F-order * **Define:** C-order reshapes flat arrays into multiple dimensions such @@ -61,10 +59,11 @@ and two dimensions having strides 1 and 3, equivalently having dimensions 3 and These terms come from conventions for storing arrays in the C and Fortran programming languages. + ### array size When discussing an array that is stored in C-order, dimension size will -be descibed using a list if sizes per dimension. For example: `[ 3, 5, 7 ].` +be described using a list if sizes per dimension. For example: `[ 3, 5, 7 ].` In this example, the left-most dimension has size `3`, the right-most dimension has size `7`. @@ -95,11 +94,14 @@ the "second", or "right" index. The terms row- and column-major derive for the storage of matrices. Defining these terms first depends on first agreeing which index (left -or right) refers to rows vs columns, and for matrices in mathematics. +or right) refers to rows vs columns for matrices in mathematics. * **Define:** Arrays storing matrices in "row-major" give columns stride 1. * **Define:** Arrays storing matrices in "column-major" give rows stride 1. +The names come from the fact that varying the column index changes the +position along a single row + * **Consequence:** Given matrix storage conventions C-order storage is equivalent to "row-major". * **Consequence:** Given matrix storage conventions F-order storage is @@ -120,6 +122,16 @@ Consider: * The flat C-ordered array will be: `[0, 1, 2, 3, 4, 5]` * The flat F-ordered array will be: `[0, 3, 1, 4, 2, 5]` +To reiterate, the multi-dimensional indexes for both C- and F-order are: + +``` + column 0 column 1 column 2 + row 0 [ (0,0) (0,1) (0,2) ] + row 1 [ (1,0) (1,1) (1,2) ] +``` + +because, the row index is *always* the left-most index. + ### image analysis @@ -130,6 +142,19 @@ common for rows NOT to have stride 1, for example when using "interleaved" color components, the "color" dimension often will have a stride of 1.) +### cartesian coordinates + +In contrast to the matrix row/column index convention, cartesian +coordinates label the horizontal and vertical dimensions `x` and `y` +respectively. Referencing positions in the 2D plane is done using +ordered two-tuples `(x,y)`, where `x` is conventionally the left-index +and `y` is the right-index. Using cartesian coordinates, varying +the left dimensions varies horizontal position, and varying the right +dimension varies the vertical position. + +Applications and workflows that make use of image geometry most commonly +use cartesian coordinates. + ## Conclusions If a convention is such that the meaning / interpretation of the index From d87203337a0580f8813f053703a92c6b25907b8b Mon Sep 17 00:00:00 2001 From: John Bogovic Date: Mon, 27 Nov 2023 11:41:53 -0500 Subject: [PATCH 05/18] typo fix --- ArrayOrder.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ArrayOrder.md b/ArrayOrder.md index a8c3ab9..bd2c549 100644 --- a/ArrayOrder.md +++ b/ArrayOrder.md @@ -77,7 +77,7 @@ Again, the left-most dimension has size `3`, the right-most dimension has size `7`. As always the *first* dimension will have stride 1. However, now using -F-order, the *second* dimension will have stride `3`, and the *second* +F-order, the *second* dimension will have stride `3`, and the *third* dimension will have stride `3*5=15`. ### row- and column-major From 74daf99f39b99f266e5cfcf551ac831b6ee48d4d Mon Sep 17 00:00:00 2001 From: John Bogovic Date: Tue, 28 Nov 2023 10:31:29 -0500 Subject: [PATCH 06/18] flesh out grouping, first=left and last=right, interpretation --- ArrayOrder.md | 99 +++++++++++++++++++++++++++++++++------------------ 1 file changed, 64 insertions(+), 35 deletions(-) diff --git a/ArrayOrder.md b/ArrayOrder.md index bd2c549..bfc8e82 100644 --- a/ArrayOrder.md +++ b/ArrayOrder.md @@ -9,8 +9,8 @@ Indexing into a multi-dimensional array is done with an ordered tuple, each element of which indexes into one of the arrays dimensions. Elements of this tuple are called "coordinates." For example, the tuple `(i,j,k)` indexes a three-dimensional array, and `i`, `j`, and `k` are -its coordinates. We will call `i` the left-most index, and `k` the -right-most index (avoiding "first" and "last"). +its coordinates. We will call `i` the "left" or "first" index, and `k` the +"right" or "last" index. The only valid coordinates for arrays are the non-negative integers. @@ -23,66 +23,94 @@ flattened 1D array. One can think of reshaping a 1D as a recursive process of grouping a number of adjacent elements. -A 1D array is reshaped to an n-dimensional array by grouping a number -adjacent elements belonging to the same dimension. +A (n-1)-dimnensional array is reshaped to an n-dimensional array by +grouping a number adjacent elements belonging to the same dimension. * **Define:** the "size" of a dimension is the number of grouped elements. -The stride of the next dimension is the size of the previous dimension. - * **Define:** the stride of a dimension is the (positive) step in the - flat array that corresponds to the adjacent element (step of one) - along that dimension. + flat array that corresponds to the adjacent element along that dimension. -* **Define:** the "first" dimension is the dimension with a stride of 1. -* **Define:** the "last" dimension is the dimension with the largest stride. +The stride of a dimension is the product of sizes of all previous dimensions. -#### example +* **Define:** the "inner" dimension is the dimension with a stride of 1. +* **Define:** the "outer" dimension is the dimension with the largest stride. + +#### examples Suppose we have this flat array: `0, 1, 2, 3, 4, 5` -and two dimensions having strides 1 and 3, equivalently having dimensions 3 and 2. +and two dimensions having sizes 3 and 2. The first stride is always 1. +The second stride is the previous dimenions' size: 3 in this example. So +our strides are 1 and 3. There is no grouping to be done for a +dimensions of stride one, so the first and only step is to group +elements into groups of 3 (the larger stride): `(0, 1, 2), (3, 4, 5)` -`[(0, 1, 2), (3, 4, 5)]` +
-### C- and F-order +a larger example + +Suppose we have this flat array: + +`0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23` + + +and three dimensions having sizes 2, 3, and 4. Their strides are 1, 2, and 6 +where `2*3 = 6`. There is no grouping to be done for a dimensions of stride 1, +so the first step is to join elements into groups of 2 (the +second stride): -* **Define:** C-order reshapes flat arrays into multiple dimensions such - that the **rightmost** index has stride 1. -* **Define:** F-order reshapes flat arrays into multiple dimensions such - that the **leftmost** index has stride 1. +`(0, 1), (2, 3), (4, 5), (6, 7), (8, 9), (10, 11), (12, 13), (14, 15), (16, 17), (18, 19), (20, 21), (22, 23)` -These terms come from conventions for storing arrays in the C and -Fortran programming languages. +Next group elements of the new list (which are themselves groups) into +groups of 3 (the largest stride). +`[(0, 1), (2, 3), (4, 5)], [(6, 7), (8, 9), (10, 11)], [(12, 13), (14, 15), (16, 17)], [(18, 19), (20, 21), (22, 23)]` + +Notice: +* The element adjacent to `0` in the inner group is `1`, hence stride `1`. +* The element adjacent to `0` in the intermediate grouping is `2`, hence stride `2`. +* The element adjacent to `0` in the outer grouping is `6`, hence stride `6`. + +
+ + +### C- and F-order + +* **Define:** C-order indexes multi-dimensional arrays such that the **last** index has stride 1. +* **Define:** F-order indexes multi-dimensional arrays such that the **first** index has stride 1. + +These terms come from conventions for storing arrays in the C and Fortran programming languages. ### array size -When discussing an array that is stored in C-order, dimension size will -be described using a list if sizes per dimension. For example: `[ 3, 5, 7 ].` -In this example, the left-most dimension has size `3`, the right-most -dimension has size `7`. +The size of a multidimensional array is described by a list of sizes per +dimension. For example: `[ 3, 5, 7 ].` In this example, the *first* +dimension has size `3`, the *last* dimension has size `7`. + +If this array is indexed using C-order, then the last index has stride +`1`. As a result, the middle index will have stride `7`, and the *first* +dimension will have stride `7*5 = 35`. -As always the *first* dimension will have stride 1. Because we're using -C-order, is the right-most index. As a result, the *second* dimension -will have stride `7`, and the *third* dimension will have stride `7*5 = -35`. +Consider again an array of size `[ 3, 5, 7 ]`, but using F-order +indexing. Again, the *first* dimension has size `3`, the *last* +dimension has size `7`. Now, however, using F-order, the +*first* dimension will have stride `1`, the *second* dimension will have +stride `3`, and the *third* dimension will have stride `3*5=15`. -Consider again an array of size `[ 3, 5, 7 ]`, but using F-order. -Again, the left-most dimension has size `3`, the right-most dimension -has size `7`. +## Interpretation of multi-dimensional arrays -As always the *first* dimension will have stride 1. However, now using -F-order, the *second* dimension will have stride `3`, and the *third* -dimension will have stride `3*5=15`. +The following sections require adding an interpretation or semantics to +the dimensions of a multi-dimensional array. We'll discuss interpreting +arrays as matrices and images. ### row- and column-major -Matrices are often represented as a 2D array of numbers. Horizontal +Matrices are usually represented as a 2D array of numbers. Horizontal groupings of these numbers are called "rows" and vertical groupings are called "columns." In mathematics, the entries of a matrix $A$ are denoted $a_{ij}$. Where rows of the matrix are indexed by $i$ the @@ -171,3 +199,4 @@ arrays the same way when flattened. 1) [nrrd axis ordering](https://teem.sourceforge.net/nrrd/format.html#general.4) 2) [n5 ordering discussion](https://github.com/saalfeldlab/n5/issues/31) +3) [multi-dimensional arrays in vigra](http://ukoethe.github.io/vigra/doc-release/vigranumpy/index.html#more-on-the-motivation-and-use-of-axistags) From 6f8f165998c6a62444487fede29cac085ab0fd9d Mon Sep 17 00:00:00 2001 From: Michael Innerberger Date: Thu, 14 Dec 2023 10:04:52 -0500 Subject: [PATCH 07/18] Fix a few typos --- ArrayOrder.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/ArrayOrder.md b/ArrayOrder.md index bfc8e82..5fe448f 100644 --- a/ArrayOrder.md +++ b/ArrayOrder.md @@ -6,7 +6,7 @@ the complete multidimensional array and are stored as a 1D array of values, called a "flattened" array. Indexing into a multi-dimensional array is done with an ordered tuple, -each element of which indexes into one of the arrays dimensions. +each element of which indexes into one of the array's dimensions. Elements of this tuple are called "coordinates." For example, the tuple `(i,j,k)` indexes a three-dimensional array, and `i`, `j`, and `k` are its coordinates. We will call `i` the "left" or "first" index, and `k` the @@ -20,10 +20,10 @@ flattened 1D array. ### Reshaping and stride -One can think of reshaping a 1D as a recursive process of grouping a +One can think of reshaping a 1D array as a recursive process of grouping a number of adjacent elements. -A (n-1)-dimnensional array is reshaped to an n-dimensional array by +An (n-1)-dimensional array is reshaped to an n-dimensional array by grouping a number adjacent elements belonging to the same dimension. * **Define:** the "size" of a dimension is the number of grouped elements. @@ -66,7 +66,7 @@ second stride): `(0, 1), (2, 3), (4, 5), (6, 7), (8, 9), (10, 11), (12, 13), (14, 15), (16, 17), (18, 19), (20, 21), (22, 23)` -Next group elements of the new list (which are themselves groups) into +Next, group elements of the new list (which are themselves groups) into groups of 3 (the largest stride). `[(0, 1), (2, 3), (4, 5)], [(6, 7), (8, 9), (10, 11)], [(12, 13), (14, 15), (16, 17)], [(18, 19), (20, 21), (22, 23)]` @@ -113,9 +113,9 @@ arrays as matrices and images. Matrices are usually represented as a 2D array of numbers. Horizontal groupings of these numbers are called "rows" and vertical groupings are called "columns." In mathematics, the entries of a matrix $A$ are -denoted $a_{ij}$. Where rows of the matrix are indexed by $i$ the -"first", or "left" index, and columns of the matrix are indexed by $j$, -the "second", or "right" index. +denoted $a_{ij}$. Rows of the matrix are indexed by $i$, the +"first" or "left" index, and columns of the matrix are indexed by $j$, +the "second" or "right" index. * **Universal Matrix Convention**: Left indexes refer to rows, right indexes refer to columns. @@ -128,11 +128,11 @@ or right) refers to rows vs columns for matrices in mathematics. * **Define:** Arrays storing matrices in "column-major" give rows stride 1. The names come from the fact that varying the column index changes the -position along a single row +position along a single row. -* **Consequence:** Given matrix storage conventions C-order storage is +* **Consequence:** Given matrix storage conventions, C-order storage is equivalent to "row-major". -* **Consequence:** Given matrix storage conventions F-order storage is +* **Consequence:** Given matrix storage conventions, F-order storage is equivalent to "column-major". #### example @@ -173,7 +173,7 @@ stride of 1.) ### cartesian coordinates In contrast to the matrix row/column index convention, cartesian -coordinates label the horizontal and vertical dimensions `x` and `y` +coordinates label the horizontal and vertical dimensions `x` and `y`, respectively. Referencing positions in the 2D plane is done using ordered two-tuples `(x,y)`, where `x` is conventionally the left-index and `y` is the right-index. Using cartesian coordinates, varying From d680393ffb39c99aa64c5cc0199c9a5dd39c9447 Mon Sep 17 00:00:00 2001 From: Michael Innerberger Date: Thu, 14 Dec 2023 10:10:13 -0500 Subject: [PATCH 08/18] Unify presentation of arrays --- ArrayOrder.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/ArrayOrder.md b/ArrayOrder.md index 5fe448f..14bb34e 100644 --- a/ArrayOrder.md +++ b/ArrayOrder.md @@ -40,7 +40,7 @@ The stride of a dimension is the product of sizes of all previous dimensions. Suppose we have this flat array: -`0, 1, 2, 3, 4, 5` +`[0, 1, 2, 3, 4, 5]` and two dimensions having sizes 3 and 2. The first stride is always 1. The second stride is the previous dimenions' size: 3 in this example. So @@ -48,7 +48,7 @@ our strides are 1 and 3. There is no grouping to be done for a dimensions of stride one, so the first and only step is to group elements into groups of 3 (the larger stride): -`(0, 1, 2), (3, 4, 5)` +`[(0, 1, 2), (3, 4, 5)]`
@@ -56,7 +56,7 @@ elements into groups of 3 (the larger stride): Suppose we have this flat array: -`0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23` +`[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]` and three dimensions having sizes 2, 3, and 4. Their strides are 1, 2, and 6 @@ -64,12 +64,12 @@ where `2*3 = 6`. There is no grouping to be done for a dimensions of stride 1, so the first step is to join elements into groups of 2 (the second stride): -`(0, 1), (2, 3), (4, 5), (6, 7), (8, 9), (10, 11), (12, 13), (14, 15), (16, 17), (18, 19), (20, 21), (22, 23)` +`[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9), (10, 11), (12, 13), (14, 15), (16, 17), (18, 19), (20, 21), (22, 23)]` Next, group elements of the new list (which are themselves groups) into groups of 3 (the largest stride). -`[(0, 1), (2, 3), (4, 5)], [(6, 7), (8, 9), (10, 11)], [(12, 13), (14, 15), (16, 17)], [(18, 19), (20, 21), (22, 23)]` +`[((0, 1), (2, 3), (4, 5)), ((6, 7), (8, 9), (10, 11)), ((12, 13), (14, 15), (16, 17)), ((18, 19), (20, 21), (22, 23))]` Notice: * The element adjacent to `0` in the inner group is `1`, hence stride `1`. @@ -89,14 +89,14 @@ These terms come from conventions for storing arrays in the C and Fortran progra ### array size The size of a multidimensional array is described by a list of sizes per -dimension. For example: `[ 3, 5, 7 ].` In this example, the *first* +dimension. For example: `[3, 5, 7].` In this example, the *first* dimension has size `3`, the *last* dimension has size `7`. If this array is indexed using C-order, then the last index has stride `1`. As a result, the middle index will have stride `7`, and the *first* dimension will have stride `7*5 = 35`. -Consider again an array of size `[ 3, 5, 7 ]`, but using F-order +Consider again an array of size `[3, 5, 7]`, but using F-order indexing. Again, the *first* dimension has size `3`, the *last* dimension has size `7`. Now, however, using F-order, the *first* dimension will have stride `1`, the *second* dimension will have From 695e01f46833eab2b4f7f6e235462040438bdbb7 Mon Sep 17 00:00:00 2001 From: John Bogovic Date: Tue, 19 Dec 2023 17:14:09 -0500 Subject: [PATCH 09/18] array order: fix typos, add language indicating limited scope --- ArrayOrder.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/ArrayOrder.md b/ArrayOrder.md index 14bb34e..ee4db85 100644 --- a/ArrayOrder.md +++ b/ArrayOrder.md @@ -1,9 +1,9 @@ ## Multi-dimensional array indexing Zarr stores multi-dimensional arrays into regularly sized chunks. -Chunks are themselves multi-dimensional arrays of a smaller size than -the complete multidimensional array and are stored as a 1D array of -values, called a "flattened" array. +Chunks are themselves multi-dimensional arrays of a (usually) smaller +size than the complete multi-dimensional array and are stored as a 1D +array of values, called a "flattened" array. Indexing into a multi-dimensional array is done with an ordered tuple, each element of which indexes into one of the array's dimensions. @@ -12,7 +12,9 @@ Elements of this tuple are called "coordinates." For example, the tuple its coordinates. We will call `i` the "left" or "first" index, and `k` the "right" or "last" index. -The only valid coordinates for arrays are the non-negative integers. +In this document, we will consider the non-negative integers as the only +valid coordinates for arrays (though this can be different in other +contexts). The following will describe conventions for mapping multi-dimensional indexes (ordered tuples of integers) to the scalar integer index of the @@ -43,7 +45,7 @@ Suppose we have this flat array: `[0, 1, 2, 3, 4, 5]` and two dimensions having sizes 3 and 2. The first stride is always 1. -The second stride is the previous dimenions' size: 3 in this example. So +The second stride is the previous dimensions' size: 3 in this example. So our strides are 1 and 3. There is no grouping to be done for a dimensions of stride one, so the first and only step is to group elements into groups of 3 (the larger stride): @@ -88,7 +90,7 @@ These terms come from conventions for storing arrays in the C and Fortran progra ### array size -The size of a multidimensional array is described by a list of sizes per +The size of a multi-dimensional array is described by a list of sizes per dimension. For example: `[3, 5, 7].` In this example, the *first* dimension has size `3`, the *last* dimension has size `7`. From caf26440a0a8139a6a86ccbf75bf6911c52105c4 Mon Sep 17 00:00:00 2001 From: John Bogovic Date: Tue, 19 Dec 2023 17:14:27 -0500 Subject: [PATCH 10/18] array order: add note on (co)lexicographic order --- ArrayOrder.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/ArrayOrder.md b/ArrayOrder.md index ee4db85..7bc273b 100644 --- a/ArrayOrder.md +++ b/ArrayOrder.md @@ -88,6 +88,9 @@ Notice: These terms come from conventions for storing arrays in the C and Fortran programming languages. +C-order is equivalent to "lexicographical order". F-order is equivalent to "co-lexicographical +order". + ### array size The size of a multi-dimensional array is described by a list of sizes per From 1fdca26502aa61539df7c2a8087fc10bf59354bf Mon Sep 17 00:00:00 2001 From: John Bogovic Date: Wed, 1 May 2024 15:30:15 -0400 Subject: [PATCH 11/18] refactor: make array order post --- ArrayOrder.md => _posts/2024-05-01-array-order.markdown | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename ArrayOrder.md => _posts/2024-05-01-array-order.markdown (100%) diff --git a/ArrayOrder.md b/_posts/2024-05-01-array-order.markdown similarity index 100% rename from ArrayOrder.md rename to _posts/2024-05-01-array-order.markdown From 2d6cd37cbb7e38aa9fe5c5bf01f59b5fcd330681 Mon Sep 17 00:00:00 2001 From: John Bogovic Date: Wed, 22 May 2024 14:01:56 -0400 Subject: [PATCH 12/18] rewrite of array order post (draft) --- _posts/2024-05-01-array-order.markdown | 207 -------------------- posts/2024-05-22-multi-dim-arrays.qmd | 261 +++++++++++++++++++++++++ 2 files changed, 261 insertions(+), 207 deletions(-) delete mode 100644 _posts/2024-05-01-array-order.markdown create mode 100644 posts/2024-05-22-multi-dim-arrays.qmd diff --git a/_posts/2024-05-01-array-order.markdown b/_posts/2024-05-01-array-order.markdown deleted file mode 100644 index 7bc273b..0000000 --- a/_posts/2024-05-01-array-order.markdown +++ /dev/null @@ -1,207 +0,0 @@ -## Multi-dimensional array indexing - -Zarr stores multi-dimensional arrays into regularly sized chunks. -Chunks are themselves multi-dimensional arrays of a (usually) smaller -size than the complete multi-dimensional array and are stored as a 1D -array of values, called a "flattened" array. - -Indexing into a multi-dimensional array is done with an ordered tuple, -each element of which indexes into one of the array's dimensions. -Elements of this tuple are called "coordinates." For example, the tuple -`(i,j,k)` indexes a three-dimensional array, and `i`, `j`, and `k` are -its coordinates. We will call `i` the "left" or "first" index, and `k` the -"right" or "last" index. - -In this document, we will consider the non-negative integers as the only -valid coordinates for arrays (though this can be different in other -contexts). - -The following will describe conventions for mapping multi-dimensional -indexes (ordered tuples of integers) to the scalar integer index of the -flattened 1D array. - -### Reshaping and stride - -One can think of reshaping a 1D array as a recursive process of grouping a -number of adjacent elements. - -An (n-1)-dimensional array is reshaped to an n-dimensional array by -grouping a number adjacent elements belonging to the same dimension. - -* **Define:** the "size" of a dimension is the number of grouped elements. - -* **Define:** the stride of a dimension is the (positive) step in the - flat array that corresponds to the adjacent element along that dimension. - -The stride of a dimension is the product of sizes of all previous dimensions. - -* **Define:** the "inner" dimension is the dimension with a stride of 1. -* **Define:** the "outer" dimension is the dimension with the largest stride. - -#### examples - -Suppose we have this flat array: - -`[0, 1, 2, 3, 4, 5]` - -and two dimensions having sizes 3 and 2. The first stride is always 1. -The second stride is the previous dimensions' size: 3 in this example. So -our strides are 1 and 3. There is no grouping to be done for a -dimensions of stride one, so the first and only step is to group -elements into groups of 3 (the larger stride): - -`[(0, 1, 2), (3, 4, 5)]` - -
- -a larger example - -Suppose we have this flat array: - -`[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]` - - -and three dimensions having sizes 2, 3, and 4. Their strides are 1, 2, and 6 -where `2*3 = 6`. There is no grouping to be done for a dimensions of stride 1, -so the first step is to join elements into groups of 2 (the -second stride): - -`[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9), (10, 11), (12, 13), (14, 15), (16, 17), (18, 19), (20, 21), (22, 23)]` - -Next, group elements of the new list (which are themselves groups) into -groups of 3 (the largest stride). - -`[((0, 1), (2, 3), (4, 5)), ((6, 7), (8, 9), (10, 11)), ((12, 13), (14, 15), (16, 17)), ((18, 19), (20, 21), (22, 23))]` - -Notice: -* The element adjacent to `0` in the inner group is `1`, hence stride `1`. -* The element adjacent to `0` in the intermediate grouping is `2`, hence stride `2`. -* The element adjacent to `0` in the outer grouping is `6`, hence stride `6`. - -
- - -### C- and F-order - -* **Define:** C-order indexes multi-dimensional arrays such that the **last** index has stride 1. -* **Define:** F-order indexes multi-dimensional arrays such that the **first** index has stride 1. - -These terms come from conventions for storing arrays in the C and Fortran programming languages. - -C-order is equivalent to "lexicographical order". F-order is equivalent to "co-lexicographical -order". - -### array size - -The size of a multi-dimensional array is described by a list of sizes per -dimension. For example: `[3, 5, 7].` In this example, the *first* -dimension has size `3`, the *last* dimension has size `7`. - -If this array is indexed using C-order, then the last index has stride -`1`. As a result, the middle index will have stride `7`, and the *first* -dimension will have stride `7*5 = 35`. - -Consider again an array of size `[3, 5, 7]`, but using F-order -indexing. Again, the *first* dimension has size `3`, the *last* -dimension has size `7`. Now, however, using F-order, the -*first* dimension will have stride `1`, the *second* dimension will have -stride `3`, and the *third* dimension will have stride `3*5=15`. - -## Interpretation of multi-dimensional arrays - -The following sections require adding an interpretation or semantics to -the dimensions of a multi-dimensional array. We'll discuss interpreting -arrays as matrices and images. - -### row- and column-major - -Matrices are usually represented as a 2D array of numbers. Horizontal -groupings of these numbers are called "rows" and vertical groupings are -called "columns." In mathematics, the entries of a matrix $A$ are -denoted $a_{ij}$. Rows of the matrix are indexed by $i$, the -"first" or "left" index, and columns of the matrix are indexed by $j$, -the "second" or "right" index. - -* **Universal Matrix Convention**: Left indexes refer to rows, - right indexes refer to columns. - -The terms row- and column-major derive for the storage of matrices. -Defining these terms first depends on first agreeing which index (left -or right) refers to rows vs columns for matrices in mathematics. - -* **Define:** Arrays storing matrices in "row-major" give columns stride 1. -* **Define:** Arrays storing matrices in "column-major" give rows stride 1. - -The names come from the fact that varying the column index changes the -position along a single row. - -* **Consequence:** Given matrix storage conventions, C-order storage is - equivalent to "row-major". -* **Consequence:** Given matrix storage conventions, F-order storage is - equivalent to "column-major". - -#### example - -As a result of the *Universal Matrix Convention* the size of a matrix -with `2` rows and `3` columns is `[2, 3]` for both C- and F-orderings. -Consider: - -``` - column 0 column 1 column 2 - row 0 [ 0 1 2 ] - row 1 [ 3 4 5 ] -``` - -* The flat C-ordered array will be: `[0, 1, 2, 3, 4, 5]` -* The flat F-ordered array will be: `[0, 3, 1, 4, 2, 5]` - -To reiterate, the multi-dimensional indexes for both C- and F-order are: - -``` - column 0 column 1 column 2 - row 0 [ (0,0) (0,1) (0,2) ] - row 1 [ (1,0) (1,1) (1,2) ] -``` - -because, the row index is *always* the left-most index. - - -### image analysis - -Most formats for storing image files store data such that the -"horizontal axis" / rows have a smaller stride than the "vertical axis" -/ columns. (Note: while rows have smaller stride than columns, it is -common for rows NOT to have stride 1, for example when using -"interleaved" color components, the "color" dimension often will have a -stride of 1.) - -### cartesian coordinates - -In contrast to the matrix row/column index convention, cartesian -coordinates label the horizontal and vertical dimensions `x` and `y`, -respectively. Referencing positions in the 2D plane is done using -ordered two-tuples `(x,y)`, where `x` is conventionally the left-index -and `y` is the right-index. Using cartesian coordinates, varying -the left dimensions varies horizontal position, and varying the right -dimension varies the vertical position. - -Applications and workflows that make use of image geometry most commonly -use cartesian coordinates. - -## Conclusions - -If a convention is such that the meaning / interpretation of the index -in a particular position (left / rightmost) is strong, then C- and F- -order will agree on the ordering of an array's dimensions, but will -store the arrays differently when flattened. - -If a convention is such that the meaning / interpretation of a -particular stride (fastest / slowest) is strong, then C- and F- order -will dis-agree on the ordering of an array's dimensions, but will store the -arrays the same way when flattened. - -### refs - -1) [nrrd axis ordering](https://teem.sourceforge.net/nrrd/format.html#general.4) -2) [n5 ordering discussion](https://github.com/saalfeldlab/n5/issues/31) -3) [multi-dimensional arrays in vigra](http://ukoethe.github.io/vigra/doc-release/vigranumpy/index.html#more-on-the-motivation-and-use-of-axistags) diff --git a/posts/2024-05-22-multi-dim-arrays.qmd b/posts/2024-05-22-multi-dim-arrays.qmd new file mode 100644 index 0000000..a622dd1 --- /dev/null +++ b/posts/2024-05-22-multi-dim-arrays.qmd @@ -0,0 +1,261 @@ +--- +title: "Multi-dimensional array indexing" +description: "Post description" +author: "John Bogovic" +date: "5/10/2024" +--- + +## Recommendations and summary + +Conventions for indexing and storing multi-dimensional data vary across programming languages and software libraries. +To make data and metadata sharing seamless, software and storage formats should clearly and explicitly communicate: + +1. the relationship of array indexes to memory layout +2. the semantic meaning of array dimensions +3. the relationship of coordinate data to array dimensions + +Recommendations (1) and (2) may be achieved either by referring to array dimensions in a clear and consistent order or with +labels that are independent of order (e.g. `['x', 'y', 'z', 't', ...]`, see [xarray](https://docs.xarray.dev/en/latest/getting-started-guide/why-xarray.html)) + +Recommendation (3) should be achieved by explicitly documenting the array indexing convention (most often C-order or F-order). + +## Multi-dimensional array indexing and memory layout + +Elements of an n-dimensional array are indexed by an ordered n-tuple, each element of which indexes into one of the array's +dimensions. We will also call elements of this tuple "indexes." For example, the tuple `(i,j,k)` indexes a three-dimensional +array where `i` is the "first" index, and `k` is the "last" index. Here, we will consider only the non-negative integers as +valid indexes for arrays, though different contexts may use a different index set. + +Multi-dimensional arrays are often stored as one-dimensional (1D), or "flat," arrays that are interpreted, or "reshaped," into +a multi-dimensional array by mapping the n-tuple of coordinates to a single index into the 1D array. The two most common +conventions for this mapping are C-order and F-order. + +In this article, we will refer to n-dimensional arrays as simply "arrays" and 1D arrays as "flat." + + +#### Reshaping arrays and stride + +One can think of reshaping a 1D array as a recursive process of grouping a number of adjacent elements. An n-dimensional array +can be reshaped to an (n+1)-dimensional array by grouping a number adjacent elements belonging to the same dimension. + +* **Define:** the *stride* of a dimension is the (positive) step in the flat array that corresponds to the adjacent element + along that dimension. + +The stride of a dimension is the product of sizes of all previous dimensions. + +* **Define:** the "fastest" or "inner" dimension is the dimension with a stride of 1. +* **Define:** the "slowest" or "outer" dimension is the dimension with the largest stride. + +* **Define:** the *size* of a dimension is the number of grouped elements. + +The size of an n-dimensional array is described by a list of sizes per dimension. For example: `[3, 5, 7].` In this example, +the *first* dimension has size `3`, the *last* dimension has size `7`. + +::: {.callout-tip collapse="true" appearance="minimal"} +# Example 1 + +Suppose we have this flat array: + +`[0, 1, 2, 3, 4, 5]` + +and two dimensions having sizes 3 and 2. The first stride is always 1. The second stride is the previous dimensions' size: 3 in +this example. So our strides are 1 and 3. There is no grouping to be done for a dimensions of stride one, so the first and only +step is to group elements into groups of 3 (the larger stride): + +`[(0, 1, 2), (3, 4, 5)]` + +::: + +::: {.callout-tip collapse="true" appearance="minimal"} +# Example 2 + +Suppose we have this flat array: + +`[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]` + + +and three dimensions having sizes 2, 3, and 4. Their strides are 1, 2, and 6 where `2*3 = 6`. There is no grouping to be done +for a dimensions of stride 1, so the first step is to join elements into groups of 2 (the second stride): + +`[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9), (10, 11), (12, 13), (14, 15), (16, 17), (18, 19), (20, 21), (22, 23)]` + +Next, group elements of the new list (which are themselves groups) into groups of 3 (the largest stride). + +`[((0, 1), (2, 3), (4, 5)), ((6, 7), (8, 9), (10, 11)), ((12, 13), (14, 15), (16, 17)), ((18, 19), (20, 21), (22, 23))]` + +Notice: +* The element adjacent to `0` in the inner group is `1`, hence stride `1`. +* The element adjacent to `0` in the intermediate grouping is `2`, hence stride `2`. +* The element adjacent to `0` in the outer grouping is `6`, hence stride `6`. +::: + + +### C- and F-order + +The terms C-order and F-order come from conventions for indexing arrays in the C and Fortran programming languages. + +**TODO** example + +If this array is indexed using C-order, then the last index has stride `1`. As a result, the middle index will have stride `7`, +and the *first* dimension will have stride `7*5 = 35`. + +Consider again an array of size `[3, 5, 7]`, but using F-order indexing. Again, the *first* dimension has size `3`, the *last* +dimension has size `7`. Now, however, using F-order, the *first* dimension will have stride `1`, the *second* dimension will +have stride `3`, and the *third* dimension will have stride `3*5 = 15`. + +## Dimension semantics + +The dimensions of a multi-dimensional array sometimes come with additional semantics depending +on what data they store. We discuss interpreting arrays as matrices and images. + +### Matrices + +Matrices are often represented as a 2D array of numbers. Horizontal groupings of these numbers are called "rows" and vertical +groupings are called "columns." In mathematics, the entries of a matrix $A$ are denoted $a_{ij}$. + +::: {.callout-note appearance="minimal"} +# Matrix indexing convention + +The first index of a matrix ($i$) refers to rows, the second index ($j$) refers to columns. +::: + +::: {.callout-tip collapse=true appearance="minimal"} +# Row- and column-major + +The terms row- and column-major derive for the storage of matrices. Defining these terms first depends on first agreeing which +index (first or last) refers to rows vs columns for matrices in mathematics. + +* **Consequence:** Given matrix indexing conventions, C-order storage is equivalent to "row-major". +* **Consequence:** Given matrix indexing conventions, F-order storage is equivalent to "column-major". + +#### example + +As a result of the *Matrix Indexing Convention* the size of a matrix with `2` rows and `3` columns is `[2, 3]` for both C- and +F-orderings. Consider: + +``` + column 0 column 1 column 2 + row 0 [ 0 1 2 ] + row 1 [ 3 4 5 ] +``` + +* The flat C-ordered array will be: `[0, 1, 2, 3, 4, 5]` +* The flat F-ordered array will be: `[0, 3, 1, 4, 2, 5]` + +To reiterate, the multi-dimensional indexes for both C- and F-order are: + +``` + column 0 column 1 column 2 + row 0 [ (0,0) (0,1) (0,2) ] + row 1 [ (1,0) (1,1) (1,2) ] +``` + +because, the row index is *always* the first index. + +::: + +For matrices, C- and F- order indexing will agree on the ordering of an array's indexes and dimensions, but will store the +arrays differently when flattened. This is because the matrix indexing convention attaches semantics (row/column) to +the index position (first/second). + +### Images + +Typically, 2D images are stored as arrays where two dimensions vary the horizontal and vertical positions of the samples, and as +a result these dimensions should be displayed horizontally and vertically, respectively. Most formats for storing "natural" +images store data such that the "horizontal axis" / rows have a smaller stride than the "vertical axis" / columns. Note: while +rows have smaller stride than columns, it is common for rows not to have stride 1, for example when using "interleaved" color +components, the "color" dimension will have a stride of 1. + +Biomedical images do not typically have "horizonal" or "vertical" dimensions, but may have other semantics (e.g., anatomical, or +related to the imaging system). Software and storage formats should not use stride as a proxy for the semantics of a dimension. + +For images, C- and F-order will historically disagree on the ordering of an array's indexes and dimensions, but will store the +arrays the same way when flattened. This is because the convention for natural images historically +associates dimension semantics to the memory layout (the fastest dimension has horizontal semantics). + +## Coordinate data + +Coordinate data are data that refer to "locations" of the multidimensional array. They may be discrete, +and refer to specific samples, or continuous, refering to points "in-between" array locations. + +::: {.callout-tip collapse="true" appearance="minimal"} +# Coordinate data examples + +* Point annotations + * "Structure `A` is located at point `(x,y)`" +* Bounding boxes / ROIs + * "Crop the image to bounding box `(min_x, min_y, width, height)` +* Other collections of points + * "My neuron skeleton consists of points `[[z0, y0, x0], ..., [zN, yN, xN]]` + +::: + + + + +### cartesian coordinates + +In contrast to the matrix row/column index convention, cartesian +coordinates label the horizontal and vertical dimensions `x` and `y`, +respectively. Referencing positions in the 2D plane is done using +ordered two-tuples `(x,y)`, where `x` is conventionally the left-index +and `y` is the right-index. Using cartesian coordinates, varying +the left dimensions varies horizontal position, and varying the right +dimension varies the vertical position. + +Applications and workflows that make use of image geometry most commonly use cartesian coordinates. + +## Definitions + +#### **Size** (of a dimension) +The number of unique indexes for a dimension. + +#### **Stride** (of a dimension) +The (positive) step in the flat array that corresponds to the adjacent element along that dimension. + +#### **C-order indexing** +Tuples for indexing multi-dimensional arrays are ordered such that the **last** coordinate has stride 1. + +#### **F-order indexing** +Tuples for indexing multi-dimensional arrays are ordered such that the **first** coordinate has stride 1. + +#### **Fastest dimension** ("inner") +The fastest (or inner) dimension of an array has stride 1 + +#### **Slowest dimension** ("outer") +The slowest (or outer) dimension of an array with the largest stride. + +#### **Row-major** +Arrays storing matrices in “row-major” store adjacent elements in the same row in adjacent memory. This is equivalent to saying +that the column dimension has stride 1. + +#### **Column-major** +Arrays storing matrices in “column-major” store adjacent elements in the same column in adjacent memory. This is equivalent to +saying that the row dimension has stride 1. + + +## References + +1) [hdf5 dataspaces](https://docs.hdfgroup.org/hdf5/develop/_h5_s__u_g.html#sec_dataspace) +2) [zarr arrays](https://zarr-specs.readthedocs.io/en/latest/v2/v2.0.html#arrays) +3) [nrrd axis ordering](https://teem.sourceforge.net/nrrd/format.html#general.4) +4) [n5 ordering discussion](https://github.com/saalfeldlab/n5/issues/31) +5) [multi-dimensional arrays in vigra](http://ukoethe.github.io/vigra/doc-release/vigranumpy/index.html#more-on-the-motivation-and-use-of-axistags) + +## Appendix + +### Programming languages + +| C-order | F-order | +| ------- | ------- | +| C | Fortran | +| C++ | Java | +| Python | Matlab | + +### Related terms + +| C-order | F-order | +| --------------------- | ------------------------ | +| lexicographical order | co-lexicographical order | +| row-major | column-major | +| matrix indexing | cartesian indexing | From 75125036812586a48f66f47da355db610c590562 Mon Sep 17 00:00:00 2001 From: John Bogovic Date: Thu, 6 Jun 2024 11:06:07 -0400 Subject: [PATCH 13/18] rm definitions --- posts/2024-05-22-multi-dim-arrays.qmd | 28 --------------------------- 1 file changed, 28 deletions(-) diff --git a/posts/2024-05-22-multi-dim-arrays.qmd b/posts/2024-05-22-multi-dim-arrays.qmd index a622dd1..af47df2 100644 --- a/posts/2024-05-22-multi-dim-arrays.qmd +++ b/posts/2024-05-22-multi-dim-arrays.qmd @@ -205,34 +205,6 @@ dimension varies the vertical position. Applications and workflows that make use of image geometry most commonly use cartesian coordinates. -## Definitions - -#### **Size** (of a dimension) -The number of unique indexes for a dimension. - -#### **Stride** (of a dimension) -The (positive) step in the flat array that corresponds to the adjacent element along that dimension. - -#### **C-order indexing** -Tuples for indexing multi-dimensional arrays are ordered such that the **last** coordinate has stride 1. - -#### **F-order indexing** -Tuples for indexing multi-dimensional arrays are ordered such that the **first** coordinate has stride 1. - -#### **Fastest dimension** ("inner") -The fastest (or inner) dimension of an array has stride 1 - -#### **Slowest dimension** ("outer") -The slowest (or outer) dimension of an array with the largest stride. - -#### **Row-major** -Arrays storing matrices in “row-major” store adjacent elements in the same row in adjacent memory. This is equivalent to saying -that the column dimension has stride 1. - -#### **Column-major** -Arrays storing matrices in “column-major” store adjacent elements in the same column in adjacent memory. This is equivalent to -saying that the row dimension has stride 1. - ## References From 52268068189ff93837aeeb2a1615626d71463113 Mon Sep 17 00:00:00 2001 From: John Bogovic Date: Thu, 6 Jun 2024 11:06:44 -0400 Subject: [PATCH 14/18] authors and description --- posts/2024-05-22-multi-dim-arrays.qmd | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/posts/2024-05-22-multi-dim-arrays.qmd b/posts/2024-05-22-multi-dim-arrays.qmd index af47df2..ecab664 100644 --- a/posts/2024-05-22-multi-dim-arrays.qmd +++ b/posts/2024-05-22-multi-dim-arrays.qmd @@ -1,7 +1,14 @@ --- title: "Multi-dimensional array indexing" -description: "Post description" -author: "John Bogovic" +description: "Recommendations to ensure that array data and + index/coordinate data can be consistently interpreted + across software using different indexing conventions." +author: + - name: "John Bogovic" + - name: "Davis Bennett" + - name: "Michael Innerberger" + - name: "Stephan Saalfeld" + - name: "Virginia Scarlett" date: "5/10/2024" --- From 4d5e6b48b028f1292f86bf098c1e08e445a3d9f4 Mon Sep 17 00:00:00 2001 From: John Bogovic Date: Thu, 6 Jun 2024 11:07:17 -0400 Subject: [PATCH 15/18] more examples and edits --- posts/2024-05-22-multi-dim-arrays.qmd | 103 ++++++++++++++++++-------- 1 file changed, 73 insertions(+), 30 deletions(-) diff --git a/posts/2024-05-22-multi-dim-arrays.qmd b/posts/2024-05-22-multi-dim-arrays.qmd index ecab664..ef16d0d 100644 --- a/posts/2024-05-22-multi-dim-arrays.qmd +++ b/posts/2024-05-22-multi-dim-arrays.qmd @@ -12,19 +12,43 @@ author: date: "5/10/2024" --- -## Recommendations and summary +## Summary and recommendations Conventions for indexing and storing multi-dimensional data vary across programming languages and software libraries. -To make data and metadata sharing seamless, software and storage formats should clearly and explicitly communicate: +Data sharing across software with different systems is technically simple *so long as the convention used is clearly +documented.* For example, one software library might store and save point coordinates as `(x,y,z)`, while another +stores them as `(z,y,x)`. Interoperability is easy (just "reverse the numbers") only if it is possible to determine +what convention was used to produce the data, and software can determine what convention is used only if it is +clearly and explicitly communicated. + +To enable seamless data and metadata sharing, software and storage formats should clearly and explicitly communicate: 1. the relationship of array indexes to memory layout 2. the semantic meaning of array dimensions 3. the relationship of coordinate data to array dimensions -Recommendations (1) and (2) may be achieved either by referring to array dimensions in a clear and consistent order or with -labels that are independent of order (e.g. `['x', 'y', 'z', 't', ...]`, see [xarray](https://docs.xarray.dev/en/latest/getting-started-guide/why-xarray.html)) +Ambiguities should be clarified by (1) explicitly documenting the array indexing convention (most often C-order or F-order), +and (2) referring to array dimensions and (3) coordinates with a consistent order and/or with labels that are +independent of order (e.g. `['x', 'y', 'z', 't', ...]`, see +[xarray](https://docs.xarray.dev/en/latest/getting-started-guide/why-xarray.html)) + +::: {.callout-tip collapse="true" appearance="minimal"} +# Examples + +These are provided as good examples, not as the only way to communicate the above information. + +#### Array size and dimension interpretation + +> 3D displacement fields are stored as 4D arrays, whose dimensions are ordered `[3,X,Y,Z]` where the first dimension contains +> displacement vector components, and varies fastest in memory (i.e. F-order). + +#### Point coordinates + +> Points are stored as 3D coordinates in a CSV file ordered `X,Y,Z` +> where the `X` dimension varies fastest in memory. + +::: -Recommendation (3) should be achieved by explicitly documenting the array indexing convention (most often C-order or F-order). ## Multi-dimensional array indexing and memory layout @@ -50,6 +74,17 @@ can be reshaped to an (n+1)-dimensional array by grouping a number adjacent elem The stride of a dimension is the product of sizes of all previous dimensions. +::: {.callout-note collapse="true" appearance="minimal"} +# Note on "units" of stride + +In this article, we will express stride in terms of elements. That is, stride 1 means the adjacent element in memory, +no matter the size in bytes per element. + +In some other contexts, stride is expressed in terms of bytes. For example, an array containing elements of type `float32`, +the smallest stride possible would be 4 bytes: the next element is "4 bytes away". + +::: + * **Define:** the "fastest" or "inner" dimension is the dimension with a stride of 1. * **Define:** the "slowest" or "outer" dimension is the dimension with the largest stride. @@ -59,22 +94,7 @@ The size of an n-dimensional array is described by a list of sizes per dimension the *first* dimension has size `3`, the *last* dimension has size `7`. ::: {.callout-tip collapse="true" appearance="minimal"} -# Example 1 - -Suppose we have this flat array: - -`[0, 1, 2, 3, 4, 5]` - -and two dimensions having sizes 3 and 2. The first stride is always 1. The second stride is the previous dimensions' size: 3 in -this example. So our strides are 1 and 3. There is no grouping to be done for a dimensions of stride one, so the first and only -step is to group elements into groups of 3 (the larger stride): - -`[(0, 1, 2), (3, 4, 5)]` - -::: - -::: {.callout-tip collapse="true" appearance="minimal"} -# Example 2 +# Example Suppose we have this flat array: @@ -91,6 +111,7 @@ Next, group elements of the new list (which are themselves groups) into groups o `[((0, 1), (2, 3), (4, 5)), ((6, 7), (8, 9), (10, 11)), ((12, 13), (14, 15), (16, 17)), ((18, 19), (20, 21), (22, 23))]` Notice: + * The element adjacent to `0` in the inner group is `1`, hence stride `1`. * The element adjacent to `0` in the intermediate grouping is `2`, hence stride `2`. * The element adjacent to `0` in the outer grouping is `6`, hence stride `6`. @@ -101,7 +122,23 @@ Notice: The terms C-order and F-order come from conventions for indexing arrays in the C and Fortran programming languages. -**TODO** example +* **C-order indexing:**: the fastest dimension corresponds to the last index, the slowest dimension corresponds to the first index +* **F-order indexing:**: the slowest dimension corresponds to the last index, the fastest dimension corresponds to the first index + + +::: {.callout-tip collapse="true" appearance="minimal"} +# Examples + +Using the same flat array as the above example: + +`[((0, 1), (2, 3), (4, 5)), ((6, 7), (8, 9), (10, 11)), ((12, 13), (14, 15), (16, 17)), ((18, 19), (20, 21), (22, 23))]` + +* The size of this array using C-order is: `[4, 3, 2]` +* The size of this array using F-order is: `[2, 3, 4]` +* The index of element `5` using C-order is: `[0][2][1]` +* The index of element `5` using F-order is: `[1][2][0]` + +::: If this array is indexed using C-order, then the last index has stride `1`. As a result, the middle index will have stride `7`, and the *first* dimension will have stride `7*5 = 35`. @@ -120,12 +157,16 @@ on what data they store. We discuss interpreting arrays as matrices and images. Matrices are often represented as a 2D array of numbers. Horizontal groupings of these numbers are called "rows" and vertical groupings are called "columns." In mathematics, the entries of a matrix $A$ are denoted $a_{ij}$. +**Recommendation:** Software should be clearly communicate when arrays represent matrices, and follow the *Matrix indexing +convention*. Software and documentation should use the terms "row-major" and "column-major" only when referring to matrices. + ::: {.callout-note appearance="minimal"} # Matrix indexing convention The first index of a matrix ($i$) refers to rows, the second index ($j$) refers to columns. ::: + ::: {.callout-tip collapse=true appearance="minimal"} # Row- and column-major @@ -167,14 +208,16 @@ the index position (first/second). ### Images -Typically, 2D images are stored as arrays where two dimensions vary the horizontal and vertical positions of the samples, and as -a result these dimensions should be displayed horizontally and vertically, respectively. Most formats for storing "natural" -images store data such that the "horizontal axis" / rows have a smaller stride than the "vertical axis" / columns. Note: while -rows have smaller stride than columns, it is common for rows not to have stride 1, for example when using "interleaved" color -components, the "color" dimension will have a stride of 1. +Two-dimensional images are often stored as arrays where two dimensions vary the horizontal and vertical positions of the +samples, and as a result these dimensions should be displayed horizontally and vertically, respectively. Most formats for +storing "natural" images store data such that the "horizontal axis" / rows have a smaller stride than the "vertical axis" / +columns. Note: while rows have smaller stride than columns, it is common for rows not to have stride 1, for example when using +"interleaved" color components, the "color" dimension will have a stride of 1. Biomedical images do not typically have +"horizontal" or "vertical" dimensions, but may have other semantics (e.g., anatomical, or related to the imaging system). -Biomedical images do not typically have "horizonal" or "vertical" dimensions, but may have other semantics (e.g., anatomical, or -related to the imaging system). Software and storage formats should not use stride as a proxy for the semantics of a dimension. +**Recommendation:** Software and storage formats should be explicit about any semantics that are attached to dimensions and +not use stride as a proxy for or to imply semantics. Documentation may specify semantics in terms of fastest/slowest dimension, +but must explicitly communicate that fact. For images, C- and F-order will historically disagree on the ordering of an array's indexes and dimensions, but will store the arrays the same way when flattened. This is because the convention for natural images historically @@ -183,7 +226,7 @@ associates dimension semantics to the memory layout (the fastest dimension has h ## Coordinate data Coordinate data are data that refer to "locations" of the multidimensional array. They may be discrete, -and refer to specific samples, or continuous, refering to points "in-between" array locations. +and refer to specific samples, or continuous, referring to points "in-between" array locations. ::: {.callout-tip collapse="true" appearance="minimal"} # Coordinate data examples From 94c960b76067e1ced9189d6cfe5ee1575991b93e Mon Sep 17 00:00:00 2001 From: John Bogovic Date: Mon, 24 Jun 2024 15:19:38 -0400 Subject: [PATCH 16/18] add Mark as author to multi-dim arrays --- posts/2024-05-22-multi-dim-arrays.qmd | 1 + 1 file changed, 1 insertion(+) diff --git a/posts/2024-05-22-multi-dim-arrays.qmd b/posts/2024-05-22-multi-dim-arrays.qmd index ef16d0d..ad2dc08 100644 --- a/posts/2024-05-22-multi-dim-arrays.qmd +++ b/posts/2024-05-22-multi-dim-arrays.qmd @@ -7,6 +7,7 @@ author: - name: "John Bogovic" - name: "Davis Bennett" - name: "Michael Innerberger" + - name: "Mark Kittisopikul" - name: "Stephan Saalfeld" - name: "Virginia Scarlett" date: "5/10/2024" From 3cd7619695dbfb1eac4cb86a4fda38939f3ac1c3 Mon Sep 17 00:00:00 2001 From: John Bogovic Date: Mon, 24 Jun 2024 15:20:19 -0400 Subject: [PATCH 17/18] multi-dim arrays rm programming languages table --- posts/2024-05-22-multi-dim-arrays.qmd | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/posts/2024-05-22-multi-dim-arrays.qmd b/posts/2024-05-22-multi-dim-arrays.qmd index ad2dc08..8e725fe 100644 --- a/posts/2024-05-22-multi-dim-arrays.qmd +++ b/posts/2024-05-22-multi-dim-arrays.qmd @@ -265,17 +265,8 @@ Applications and workflows that make use of image geometry most commonly use car 4) [n5 ordering discussion](https://github.com/saalfeldlab/n5/issues/31) 5) [multi-dimensional arrays in vigra](http://ukoethe.github.io/vigra/doc-release/vigranumpy/index.html#more-on-the-motivation-and-use-of-axistags) -## Appendix -### Programming languages - -| C-order | F-order | -| ------- | ------- | -| C | Fortran | -| C++ | Java | -| Python | Matlab | - -### Related terms +## Related terms | C-order | F-order | | --------------------- | ------------------------ | From 07a242319c5af55af32aad6ebdd40c7d29459765 Mon Sep 17 00:00:00 2001 From: John Bogovic Date: Mon, 24 Jun 2024 15:23:06 -0400 Subject: [PATCH 18/18] multi-dim arrays assorted edits * rework stride and related definitions * clearer recommendations re memory layout * recommendations re dimension naming --- posts/2024-05-22-multi-dim-arrays.qmd | 69 ++++++++++++++++----------- 1 file changed, 40 insertions(+), 29 deletions(-) diff --git a/posts/2024-05-22-multi-dim-arrays.qmd b/posts/2024-05-22-multi-dim-arrays.qmd index 8e725fe..932c534 100644 --- a/posts/2024-05-22-multi-dim-arrays.qmd +++ b/posts/2024-05-22-multi-dim-arrays.qmd @@ -16,10 +16,10 @@ date: "5/10/2024" ## Summary and recommendations Conventions for indexing and storing multi-dimensional data vary across programming languages and software libraries. -Data sharing across software with different systems is technically simple *so long as the convention used is clearly +Data sharing across software with different conventions is technically simple *so long as the convention used is clearly documented.* For example, one software library might store and save point coordinates as `(x,y,z)`, while another stores them as `(z,y,x)`. Interoperability is easy (just "reverse the numbers") only if it is possible to determine -what convention was used to produce the data, and software can determine what convention is used only if it is +what convention was used to produce the data. Users and software can determine what convention is used only if it is clearly and explicitly communicated. To enable seamless data and metadata sharing, software and storage formats should clearly and explicitly communicate: @@ -30,8 +30,8 @@ To enable seamless data and metadata sharing, software and storage formats shoul Ambiguities should be clarified by (1) explicitly documenting the array indexing convention (most often C-order or F-order), and (2) referring to array dimensions and (3) coordinates with a consistent order and/or with labels that are -independent of order (e.g. `['x', 'y', 'z', 't', ...]`, see -[xarray](https://docs.xarray.dev/en/latest/getting-started-guide/why-xarray.html)) +independent of order (e.g. `['x', 'y', 'z', 't', ...]` (see +[xarray](https://docs.xarray.dev/en/latest/getting-started-guide/why-xarray.html)). ::: {.callout-tip collapse="true" appearance="minimal"} # Examples @@ -58,22 +58,18 @@ dimensions. We will also call elements of this tuple "indexes." For example, the array where `i` is the "first" index, and `k` is the "last" index. Here, we will consider only the non-negative integers as valid indexes for arrays, though different contexts may use a different index set. -Multi-dimensional arrays are often stored as one-dimensional (1D), or "flat," arrays that are interpreted, or "reshaped," into +Multi-dimensional arrays are often stored as one-dimensional (1D), or "flat," arrays that are interpreted or "reshaped" into a multi-dimensional array by mapping the n-tuple of coordinates to a single index into the 1D array. The two most common conventions for this mapping are C-order and F-order. -In this article, we will refer to n-dimensional arrays as simply "arrays" and 1D arrays as "flat." - +In this article, we will refer to n-dimensional arrays as simply "arrays" and 1D arrays as "flat arrays." #### Reshaping arrays and stride One can think of reshaping a 1D array as a recursive process of grouping a number of adjacent elements. An n-dimensional array can be reshaped to an (n+1)-dimensional array by grouping a number adjacent elements belonging to the same dimension. - -* **Define:** the *stride* of a dimension is the (positive) step in the flat array that corresponds to the adjacent element - along that dimension. - -The stride of a dimension is the product of sizes of all previous dimensions. +The [*stride*](https://en.wikipedia.org/wiki/Stride_of_an_array) of each dimension can be used to communicate how indexes +relate to memory layout but more typical is to specify C-order or F-order (see below). ::: {.callout-note collapse="true" appearance="minimal"} # Note on "units" of stride @@ -86,10 +82,11 @@ the smallest stride possible would be 4 bytes: the next element is "4 bytes away ::: -* **Define:** the "fastest" or "inner" dimension is the dimension with a stride of 1. -* **Define:** the "slowest" or "outer" dimension is the dimension with the largest stride. +Other terms that are useful for communicating memory layout include: -* **Define:** the *size* of a dimension is the number of grouped elements. +* The **"fastest"** or **"inner"** dimension is the dimension with a stride of 1. +* The **"slowest"** or **"outer"** dimension is the dimension with the largest stride. +* The **size** of a dimension is the number of grouped elements. The size of an n-dimensional array is described by a list of sizes per dimension. For example: `[3, 5, 7].` In this example, the *first* dimension has size `3`, the *last* dimension has size `7`. @@ -123,8 +120,8 @@ Notice: The terms C-order and F-order come from conventions for indexing arrays in the C and Fortran programming languages. -* **C-order indexing:**: the fastest dimension corresponds to the last index, the slowest dimension corresponds to the first index -* **F-order indexing:**: the slowest dimension corresponds to the last index, the fastest dimension corresponds to the first index +* **C-order indexing:** the fastest dimension corresponds to the last index, the slowest dimension corresponds to the first index +* **F-order indexing:** the slowest dimension corresponds to the last index, the fastest dimension corresponds to the first index ::: {.callout-tip collapse="true" appearance="minimal"} @@ -148,6 +145,22 @@ Consider again an array of size `[3, 5, 7]`, but using F-order indexing. Again, dimension has size `7`. Now, however, using F-order, the *first* dimension will have stride `1`, the *second* dimension will have stride `3`, and the *third* dimension will have stride `3*5 = 15`. +**Recommendation:** Communicate the relationship of array indexes and sizes to memory layout. Consider using the language +in the examples below or similar. + +::: {.callout-tip collapse="true" appearance="minimal"} +# Examples + +These are provided as effective examples, not as the only way to communicate information. + +### Example 1 +> Array sizes and coordinates are stored in C-order. + +### Example 2 +> Coordinates are ordered `X,Y,Z` where the `X` is contiguous in memory. + +::: + ## Dimension semantics The dimensions of a multi-dimensional array sometimes come with additional semantics depending @@ -209,11 +222,11 @@ the index position (first/second). ### Images -Two-dimensional images are often stored as arrays where two dimensions vary the horizontal and vertical positions of the -samples, and as a result these dimensions should be displayed horizontally and vertically, respectively. Most formats for +Two-dimensional images are often stored as arrays where two of the array dimensions vary the horizontal and vertical positions of the +samples, and as a result these dimensions should be displayed horizontally and vertically, respectively. Most formats for storing "natural" images store data such that the "horizontal axis" / rows have a smaller stride than the "vertical axis" / columns. Note: while rows have smaller stride than columns, it is common for rows not to have stride 1, for example when using -"interleaved" color components, the "color" dimension will have a stride of 1. Biomedical images do not typically have +"interleaved" color components, the color dimension will have a stride of 1. Biomedical images do not typically have "horizontal" or "vertical" dimensions, but may have other semantics (e.g., anatomical, or related to the imaging system). **Recommendation:** Software and storage formats should be explicit about any semantics that are attached to dimensions and @@ -242,20 +255,18 @@ and refer to specific samples, or continuous, referring to points "in-between" a ::: - - ### cartesian coordinates -In contrast to the matrix row/column index convention, cartesian -coordinates label the horizontal and vertical dimensions `x` and `y`, -respectively. Referencing positions in the 2D plane is done using -ordered two-tuples `(x,y)`, where `x` is conventionally the left-index -and `y` is the right-index. Using cartesian coordinates, varying -the left dimensions varies horizontal position, and varying the right -dimension varies the vertical position. +In contrast to the matrix row/column index convention, cartesian coordinates label the horizontal and vertical dimensions `x` +and `y`, respectively. Referencing positions in the 2D plane is done using ordered two-tuples `(x,y)`, where `x` is +conventionally the first index and `y` is the second index. Using cartesian coordinates, varying the first dimensions varies +horizontal position, and varying the second dimension varies the vertical position. Applications and workflows that make use of image geometry most commonly use cartesian coordinates. +**Recommendation:** Name your dimensions `x`, `y`, or `z` only if they are spatial. Name the "horizontal" and "vertical" +dimensions `x` and `y`, respectively if those semantics are relevant. + ## References