Skip to content

Commit fb74afc

Browse files
committed
Add documentation for data manipulation commands: cut, interpolate, rebin, average, and noise
1 parent e01e8c2 commit fb74afc

6 files changed

Lines changed: 631 additions & 0 deletions

File tree

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
---
2+
title: Data Manipulation
3+
parent: Syntax
4+
nav_order: 60
5+
permalink: /commands/data-manipulation/
6+
has_children: true
7+
math: katex
8+
---
9+
10+
# Data Manipulation
11+
12+
Data manipulation commands modify the structure or organization of XAS datasets. These operations include resampling, filtering, merging, and extracting subsets of data to prepare it for analysis or visualization.
13+
14+
EstraPy provides five data manipulation commands:
15+
16+
1. **[`cut`]({{ "/commands/data-manipulation/cut" | relative_url }})** - Extracts a subset of data within a specified range
17+
2. **[`interpolate`]({{ "/commands/data-manipulation/interpolate" | relative_url }})** - Resamples data onto a new uniform grid via interpolation
18+
3. **[`rebin`]({{ "/commands/data-manipulation/rebin" | relative_url }})** - Bins data into fixed intervals and averages within bins
19+
4. **[`average`]({{ "/commands/data-manipulation/average" | relative_url }})** - Merges multiple scans into averaged spectra
20+
5. **[`noise`]({{ "/commands/data-manipulation/noise" | relative_url }})** - Estimates noise levels from even-odd point differences
21+
22+
## Destructive vs. Non-Destructive Operations
23+
24+
{: .warning }
25+
Most data manipulation commands are **destructive**: they replace original data and cannot be undone. Always verify your parameters before executing.
26+
27+
(Note that the original data file remains unchanged; only the in-memory dataset is modified. You can reload the file to restore original data if needed.)
28+
29+
- **Destructive:** `cut`, `interpolate`, `rebin`, `average` — Original data is permanently modified
30+
- **Non-destructive:** `noise` — Adds a new column without altering existing data
31+
32+
## When to Use Data Manipulation
33+
34+
### Cut
35+
36+
Use `cut` to:
37+
38+
- Remove noisy or unusable regions at the beginning or end of a scan
39+
- Focus analysis on a specific energy or k-range
40+
- Trim data before merging files with different scan ranges
41+
42+
### Interpolate
43+
44+
Use `interpolate` to:
45+
46+
- Resample data onto a finer or coarser grid
47+
- Align multiple scans with different step sizes before averaging
48+
- Prepare data for algorithms requiring uniform spacing
49+
50+
### Rebin
51+
52+
Use `rebin` to:
53+
54+
- Reduce noise by averaging neighboring points
55+
- Downsample high-resolution data for faster processing
56+
- Create uniform bins from irregularly spaced data
57+
58+
### Average
59+
60+
Use `average` to:
61+
62+
- Merge multiple scans of the same sample to improve signal-to-noise
63+
- Combine replicate measurements
64+
- Group scans by metadata (temperature, concentration, etc.)
65+
66+
### Noise
67+
68+
Use `noise` to:
69+
70+
- Assess data quality and identify noisy regions
71+
- Weight data points in fitting routines
72+
- Compare noise levels across different experimental conditions
73+
74+
## Domain Considerations
75+
76+
- `cut`, `interpolate`, `rebin` can operate in **any domain** (reciprocal, Fourier)
77+
- `average` operates on a **specified domain** (default: reciprocal)
78+
- `noise` operates in the **reciprocal domain** only
79+
80+
## See Also
81+
82+
- **[Cut]({{ "/commands/data-manipulation/cut" | relative_url }})** - Extract data subsets
83+
- **[Interpolate]({{ "/commands/data-manipulation/interpolate" | relative_url }})** - Resample onto new grid
84+
- **[Rebin]({{ "/commands/data-manipulation/rebin" | relative_url }})** - Bin and average
85+
- **[Average]({{ "/commands/data-manipulation/average" | relative_url }})** - Merge multiple scans
86+
- **[Noise]({{ "/commands/data-manipulation/noise" | relative_url }})** - Estimate noise levels
87+
88+
---
89+
90+
**Next:** Choose a command to learn about specific options and usage.

docs/03_syntax/80_1_cut.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
---
2+
title: Cut
3+
parent: Data Manipulation
4+
grand_parent: Syntax
5+
nav_order: 1
6+
permalink: /commands/data-manipulation/cut/
7+
math: katex
8+
---
9+
10+
# Cut
11+
12+
The `cut` command extracts a subset of data within a specified axis range, discarding all points outside that range. This is a **destructive operation** — original data is permanently removed from the working dataset.
13+
14+
(Note that the original data file remains unchanged; only the in-memory dataset is modified.)
15+
16+
## Basic Usage
17+
18+
```sh
19+
cut <range> [options]
20+
```
21+
22+
where:
23+
24+
- `<range>` is a two-value range defining the region to keep
25+
26+
## Command Options
27+
28+
| Option | Description |
29+
|--------|-------------|
30+
| `<range>` | Range of values to retain (required). All data outside this range is removed. See [range specification]({{ "/commands/general-syntax#range-specification" | relative_url }}) for details. |
31+
| `--axis <axis>` | Axis column to apply the cut (auto-inferred from range units if omitted). |
32+
| `--domain <domain>` | Domain in which to perform the cut (auto-inferred if omitted). Options: `reciprocal`, `fourier`. |
33+
34+
## Range and Unit Inference
35+
36+
If `--axis` and `--domain` are not specified, EstraPy infers them from the range units:
37+
38+
- `eV` or `k` units → reciprocal domain
39+
- `Å` units → Fourier domain
40+
- Axis is selected based on unit type: `E` for eV, `k` for k, `r` for Å
41+
42+
## Examples
43+
44+
```sh
45+
# Keep only data between 8000 and 9000 eV
46+
cut 8000eV 9000eV
47+
48+
# Cut in k-space (reciprocal domain)
49+
cut 3k 14k
50+
51+
# Cut in R-space (Fourier domain)
52+
cut 1.5A 4.5A
53+
54+
# Explicit axis and domain specification
55+
cut 0 1000 --axis E --domain reciprocal
56+
```
57+
58+
## Behavior
59+
60+
The command:
61+
62+
1. Identifies all data points where the axis value falls within `[range[0], range[1]]`
63+
2. Removes all points outside this range
64+
3. Resets the dataframe index
65+
66+
{: .warning }
67+
**Destructive operation:** Data outside the specified range is deleted from the working dataset and cannot be recovered without reloading the original file.
68+
The original data file remains unchanged; only the in-memory dataset is modified.
69+
70+
## Use Cases
71+
72+
- **Remove noisy edges:** Trim beginning/end of scans where detector response is poor
73+
- **Focus analysis:** Isolate specific features (e.g., pre-edge, XANES, EXAFS regions)
74+
- **Align datasets:** Match scan ranges before averaging multiple files
75+
- **Reduce file size:** Discard irrelevant energy regions before saving
76+
77+
## Tips
78+
79+
- Use `cut` early in your workflow to reduce processing time for subsequent commands
80+
- Combine with `rebin` or `interpolate` to create uniform, focused datasets
81+
- For non-destructive range selection in plotting, use range arguments in the plot command instead
82+
83+
**See also:**
84+
85+
- [Data Manipulation overview]({{ "/commands/data-manipulation" | relative_url }})
86+
- [General syntax]({{ "/commands/general-syntax" | relative_url }})

docs/03_syntax/80_2_interpolate.md

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
---
2+
title: Interpolate
3+
parent: Data Manipulation
4+
grand_parent: Syntax
5+
nav_order: 2
6+
permalink: /commands/data-manipulation/interpolate/
7+
math: katex
8+
---
9+
10+
# Interpolate
11+
12+
The `interpolate` command resamples data onto a new uniform grid using cubic spline interpolation. This is a **destructive operation** — the original axis and data values are replaced.
13+
14+
(Note that the original data file remains unchanged; only the in-memory dataset is modified.)
15+
16+
## Basic Usage
17+
18+
```sh
19+
interpolate <range> <--interval <step> | --number <n>> [options]
20+
```
21+
22+
where:
23+
24+
- `<range>` defines the new axis limits
25+
- Either `--interval` or `--number` specifies grid spacing (exactly one required)
26+
27+
## Command Options
28+
29+
| Option | Description |
30+
|--------|-------------|
31+
| `<range>` | Range for the new interpolated axis (required). See [range specification]({{ "/commands/general-syntax#range-specification" | relative_url }}) for details. |
32+
| `--interval <step>` | Fixed step size for the new axis (mutually exclusive with `--number`). Must include units. |
33+
| `--number <n>` | Number of evenly-spaced points in the new axis (mutually exclusive with `--interval`). |
34+
| `--axis <axis>` | Axis column for interpolation (auto-inferred from range units if omitted). |
35+
| `--domain <domain>` | Domain in which to interpolate (auto-inferred if omitted). Options: `reciprocal`, `fourier`. |
36+
37+
## Interpolation Method
38+
39+
EstraPy uses **cubic spline interpolation** (degree 3) via `scipy.interpolate.make_interp_spline`. This provides:
40+
41+
- Smooth, continuous curves through existing data points
42+
- C² continuity (smooth first and second derivatives)
43+
- Minimal oscillation between points
44+
45+
## Examples
46+
47+
```sh
48+
# Interpolate to 0.5 eV steps over 8000-9000 eV
49+
interpolate 8000eV 9000eV --interval 0.5eV
50+
51+
# Interpolate to exactly 500 points in k-space
52+
interpolate 3k 15k --number 500
53+
54+
# Fine r-space grid with 0.01 Å spacing
55+
interpolate 0A 6A --interval 0.01A --domain fourier
56+
57+
# Explicit axis specification
58+
interpolate 0 1000 --number 2000 --axis E --domain reciprocal
59+
```
60+
61+
## Behavior
62+
63+
The command:
64+
65+
1. Creates a new uniform axis within `[range[0], range[1]]` with specified spacing or count
66+
2. Interpolates all data columns onto this new axis using cubic splines
67+
3. Replaces the domain's data with interpolated values
68+
69+
{: .warning }
70+
**Destructive operation:** Original axis values and data are replaced with interpolated results. The original data file remains unchanged; only the in-memory dataset is modified. Interpolation cannot add information — it only resamples existing data, and may introduce noise or artifacts if the original data is sparse or noisy.
71+
72+
## Use Cases
73+
74+
- **Uniform grids:** Convert irregularly-spaced data to fixed intervals
75+
- **Alignment:** Resample multiple scans to identical grids before averaging
76+
- **Upsampling:** Increase point density for smoother plots (cosmetic only)
77+
- **Downsampling:** Reduce point count while preserving overall shape
78+
79+
## Tips and Best Practices
80+
81+
1. **Avoid excessive upsampling:** Interpolation does not improve resolution or add real information
82+
2. **Preserve Nyquist limits:** When downsampling, ensure the new interval captures all spectral features
83+
3. **Check endpoint behavior:** Cubic splines may oscillate near range boundaries if data is noisy
84+
4. **Pre-filter noise:** Consider using `rebin` instead if your goal is noise reduction
85+
86+
## Comparison with Rebin
87+
88+
| Feature | `interpolate` | `rebin` |
89+
|---------|---------------|---------|
90+
| Method | Cubic spline fitting | Binning + averaging |
91+
| Noise reduction | No | Yes (averages reduce noise) |
92+
| Speed | Moderate | Fast |
93+
| Use case | Resampling to new grid | Noise reduction + downsampling |
94+
95+
**See also:**
96+
97+
- [Data Manipulation overview]({{ "/commands/data-manipulation" | relative_url }})
98+
- [Rebin]({{ "/commands/data-manipulation/rebin" | relative_url }})
99+
- [General syntax]({{ "/commands/general-syntax" | relative_url }})

docs/03_syntax/80_3_rebin.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
---
2+
title: Rebin
3+
parent: Data Manipulation
4+
grand_parent: Syntax
5+
nav_order: 3
6+
permalink: /commands/data-manipulation/rebin/
7+
math: katex
8+
---
9+
10+
# Rebin
11+
12+
The `rebin` command bins data into fixed intervals and averages all points within each bin. This is a **destructive operation** that reduces noise but permanently replaces original data.
13+
14+
(Note that the original data file remains unchanged; only the in-memory dataset is modified.)
15+
16+
## Basic Usage
17+
18+
```sh
19+
rebin <range> <--interval <step> | --number <n>> [options]
20+
```
21+
22+
where:
23+
24+
- `<range>` defines the binning limits
25+
- Either `--interval` or `--number` specifies bin size (exactly one required)
26+
27+
## Command Options
28+
29+
| Option | Description |
30+
|--------|-------------|
31+
| `<range>` | Range for rebinning (required). Must have finite bounds. See [range specification]({{ "/commands/general-syntax#range-specification" | relative_url }}) for details. |
32+
| `--interval <step>` | Fixed bin width (mutually exclusive with `--number`). Must include units. |
33+
| `--number <n>` | Number of bins to create (mutually exclusive with `--interval`). |
34+
| `--axis <axis>` <br> `-x <axis>` | Axis column for binning (auto-inferred from range units if omitted). |
35+
| `--domain <domain>` | Domain in which to rebin (default: `reciprocal`). Options: `reciprocal`, `fourier`. |
36+
| `--fix-points` | Interpolate bin averages back to exact bin centers (preserves target axis values). |
37+
38+
## Binning Method
39+
40+
The command:
41+
42+
1. Creates bin boundaries at half-intervals: `[range[0] - step/2, ..., range[1] + step/2]`
43+
2. Assigns each data point to a bin
44+
3. Computes the mean of all points in each bin (all columns)
45+
4. Optionally interpolates to exact bin centers if `--fix-points` is set
46+
47+
## Fix Points Option
48+
49+
- **Default (no `--fix-points`):** Bin centers are the mean axis values within each bin (may not align exactly with target grid)
50+
- **With `--fix-points`:** After averaging, cubic spline interpolation repositions points to exact bin centers
51+
52+
Use `--fix-points` when you need bin centers at precise positions (e.g., for subsequent operations requiring exact alignment between files).
53+
54+
## Examples
55+
56+
```sh
57+
# Rebin energy axis to 1 eV intervals
58+
rebin 8000eV 9000eV --interval 1eV
59+
60+
# Create exactly 100 bins in k-space
61+
rebin 2k 14k --number 100
62+
63+
# Rebin Fourier domain with fixed bin centers
64+
rebin 0A 5A --interval 0.1A --domain fourier --fix-points
65+
66+
# Explicit axis specification
67+
rebin 0 1000 --interval 2 --axis E --domain reciprocal
68+
```
69+
70+
## Behavior
71+
72+
Rebinning:
73+
74+
- **Reduces noise** by averaging multiple points within each bin
75+
- **Downsamples** high-resolution data for faster processing
76+
- **Removes outliers** if most points in a bin are consistent
77+
- **Smooths** irregular step sizes into uniform intervals
78+
79+
{: .warning }
80+
**Destructive operation:** Original data is replaced with binned averages. The original data file remains unchanged; only the in-memory dataset is modified. Data points outside the extended range `[range[0] - step/2, range[1] + step/2]` are discarded.
81+
82+
## Use Cases
83+
84+
- **Noise reduction:** Average out high-frequency noise while preserving overall shape
85+
- **Downsampling:** Reduce point count from oversampled scans
86+
- **Uniformity:** Convert variable-spacing scans to fixed intervals
87+
- **Pre-processing for averaging:** Align multiple scans to identical grids before merging
88+
89+
## Tips and Best Practices
90+
91+
1. **Choose appropriate bin size:**
92+
93+
- Too small → minimal noise reduction
94+
- Too large → loss of spectral features
95+
- Typical: 0.5–2 eV for energy, 0.05–0.2 k⁻¹ for k-space
96+
97+
2. **Check data spacing:** Bin width should be larger than original step size to achieve averaging
98+
99+
3. **Use for noise, not upsampling:** Rebinning cannot increase resolution; use `interpolate` for upsampling (interpolation does not add information, only resamples existing data)
100+
101+
4. **Edge effects:** Data near range boundaries may be lost if bins extend outside the data range
102+
103+
## Comparison with Interpolate
104+
105+
| Feature | `rebin` | `interpolate` |
106+
|---------|---------|---------------|
107+
| Method | Binning + averaging | Cubic spline fitting |
108+
| Noise reduction | Yes (averaging reduces noise) | No |
109+
| Outlier handling | Robust (majority vote per bin) | Sensitive (fits through all points) |
110+
| Use case | Noise reduction + downsampling | Resampling to new grid |
111+
112+
**See also:**
113+
114+
- [Data Manipulation overview]({{ "/commands/data-manipulation" | relative_url }})
115+
- [Interpolate]({{ "/commands/data-manipulation/interpolate" | relative_url }})
116+
- [General syntax]({{ "/commands/general-syntax" | relative_url }})

0 commit comments

Comments
 (0)