Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
de420cc
Minor cleanup
GBirkel Jan 5, 2026
07b2c20
We're not just doing ingesting anymore!
GBirkel Jan 8, 2026
4588f0a
Adding dataset_metadata_schemas access, and a utility function with a…
GBirkel Jan 8, 2026
e06b7f8
Handling old schema versions and attempting to auto-convert, when imp…
GBirkel Jan 10, 2026
bb1735a
A minor note.
GBirkel Jan 10, 2026
aeda1f1
Minor technical debt
GBirkel Jan 19, 2026
633f311
Merge branch '2026/01/file_record_updating' of github.com:als-computi…
GBirkel Jan 20, 2026
9d9a738
Removing an old note
GBirkel Jan 22, 2026
ffa5cb6
Merge branch 'refs/heads/main' into 2026/01/file_record_updating
GBirkel Jan 22, 2026
0b0be3c
Sort input paths into files vs. folders. If we're given more then one…
GBirkel Jan 22, 2026
f9d4363
These have been moved to dataset_metadata_schemas .
GBirkel Jan 22, 2026
7ba0143
Adding dataset_tracker_client as a dependency.
GBirkel Jan 24, 2026
1bff7ab
Extra arguments for Dataset Tracker config. Seeking an existing als-d…
GBirkel Jan 24, 2026
c1bcb77
Passing the dataset tracker into the ingesters.
GBirkel Jan 24, 2026
80f8836
Partially updating current ingesters to use new function signature. S…
GBirkel Jan 27, 2026
41fa3fb
Ongoing refactor of the main ingest function. Standardizing on one co…
GBirkel Jan 27, 2026
dfc9950
This file seems a bit bloated. :D
GBirkel Jan 27, 2026
cbd2858
"uv sync" is a more sensible invocation since we're using uv.
GBirkel Jan 27, 2026
6febe67
Changing the signature yet again!
GBirkel Jan 27, 2026
3bf6181
Finishing up the ingester refactor. Needs testing,
GBirkel Jan 27, 2026
9983670
Was badly written. For example, turning "time=2000-01-02T12:23:45" in…
GBirkel Jan 28, 2026
dbd9a0e
Partially corrected rewrite of 733 ingester. But this code presents a…
GBirkel Jan 28, 2026
e8dc9dc
Redundant line...
GBirkel Jan 28, 2026
a15c85c
This was deceptively named.
GBirkel Jan 29, 2026
933ff51
Utility functions for building file manifest objects from a folder or…
GBirkel Jan 29, 2026
2011685
Filling out metadata in the file before writing it back to disk. Usin…
GBirkel Jan 29, 2026
1049f6a
Now that I've combed through the rest of it, I'm kind of shocked we'r…
GBirkel Jan 29, 2026
139affc
Minor comment fixup
GBirkel Jan 29, 2026
c0d327d
Not using the "bl" prefix.
GBirkel Jan 29, 2026
0a0aeec
Fail early, fail often!
GBirkel Jan 29, 2026
53421a9
More minor commentary.
GBirkel Jan 29, 2026
5214a8d
Yet another function signature change!
GBirkel Jan 30, 2026
68c6124
Now we accumulate logs into a list, and embed it in the metadata file…
GBirkel Jan 30, 2026
d9dd862
Refactored 733 ingester to use a file manifest. No longer using "issu…
GBirkel Jan 30, 2026
620a850
Refactored 832 ingesters 1 and 2 to use a file manifest. No longer us…
GBirkel Jan 30, 2026
8d3f354
Not using "issues".
GBirkel Feb 2, 2026
17542dc
Updating flow method signature.
GBirkel Feb 2, 2026
821bd7f
About time we added a uv lockfile.
GBirkel Feb 2, 2026
1829162
More function signature updates... I really need to turn all these i…
GBirkel Feb 2, 2026
d598ca5
Bogus import!
GBirkel Feb 2, 2026
87c7c02
A bit more logging.
GBirkel Feb 2, 2026
669c443
We let Prefect pass in a logger, and we need to pass it down to the i…
GBirkel Feb 2, 2026
e5207d5
Is this a Datetime object? Let's find out.
GBirkel Feb 3, 2026
55042f2
"isoformat" does not reliably produce a UTC timestamp that SciCat can…
GBirkel Feb 3, 2026
8736aad
Oops, need separators
GBirkel Feb 3, 2026
50143d7
A bit more time/date cleanup.
GBirkel Feb 3, 2026
d7dabf2
Hey maaaan, you can't log that heeere. You gonna have to move it uptown.
GBirkel Feb 3, 2026
7a889de
This is so out of date we shouldn't keep it. It's been replaced by a …
GBirkel Feb 4, 2026
0f1028b
I couldn't stand it any more, and decided to create an ingester base …
GBirkel Feb 4, 2026
1999242
Import corrections
GBirkel Feb 4, 2026
6f53511
Date handling corrections.
GBirkel Feb 4, 2026
93da0fe
Need to pass beamline/proposal IDs as slugs in record creation.
GBirkel Feb 4, 2026
60b3cad
Updating this just in case. Hopefully no one uses it. ;)
GBirkel Feb 4, 2026
b227ab7
Testing with a different ingester.
GBirkel Feb 4, 2026
3ebf721
Slightly better logging of important values.
GBirkel Feb 4, 2026
726899a
Ooops wrong field name
GBirkel Feb 4, 2026
e679185
Just a bit more logging.
GBirkel Feb 6, 2026
9df7e68
Removing 'issue' and 'severity', as they aren't used by the new class.
GBirkel Mar 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 7 additions & 127 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,134 +1,14 @@
# Byte-compiled / optimized / DLL files
# Python-generated files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
*.py[oc]
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py
*.egg-info
.DS_Store

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
# Virtual environments
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

.vscode/

*.env
.netrc.gpg
# Environment variables (contains secrets)
.env
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ If you're developing locally, install dependencies and work in a virtual environ
```
uv venv --python 3.11 # or greater
source .venv/bin/activate
uv pip install --all-extras -r pyproject.toml -e .
uv sync --all-extras
```

Then try running the tests:
Expand Down
4 changes: 3 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ dependencies = [
"PyHyperScattering",
"pymongo",
"pyscicat @ git+https://github.com/SciCatProject/pyscicat.git@main",
"dataset_metadata_schemas @ git+https://github.com/als-computing/dataset_metadata_schemas.git@main",
"dataset_tracker_client @ git+https://github.com/als-computing/dataset_tracker_client.git@main",
"python-dotenv>=1.2.1",
"requests",
"typer",
Expand All @@ -37,5 +39,5 @@ requires = [
build-backend = "setuptools.build_meta"

[tool.setuptools.packages.find]
where = [".", "src"]
where = ["src", "."]
include = ["scicat_beamline*"]
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ pydantic>=2.11
PyHyperScattering
pymongo
pyscicat @ git+https://github.com/SciCatProject/pyscicat.git@main
dataset_tracker_client @ git+https://github.com/als-computing/dataset_tracker_client.git@main
dataset_metadata_schemas @ git+https://github.com/als-computing/dataset_metadata_schemas.git@main
python-dotenv>=1.2.1
requests
typer
6 changes: 1 addition & 5 deletions scripts/733_batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
from pathlib import Path
from typing import List

from common_ingester_utils import Issue
from ingesters import als_733_SAXS
from pyscicat.client import from_token

Expand All @@ -12,14 +11,11 @@
user = sys.argv[3]
scicat_url_base = sys.argv[4]
try:
issues: List[Issue] = []
client = from_token(scicat_url_base, token)
txt_files = folder.glob("**/*.txt")
with tempfile.TemporaryDirectory() as thumbs_dir:
for txt_file in list(txt_files):
als_733_SAXS.ingest(client, user, txt_file, thumbs_dir, issues)
als_733_SAXS.ingest(client, user, txt_file, thumbs_dir)
print(f"Ingesting {txt_file}")
if len(issues) > 0:
print(f" Issues found {[str(issue) for issue in issues]}")
except Exception as e:
print(e)
23 changes: 17 additions & 6 deletions scripts/manual_ingest.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import os
import pathlib
from pathlib import Path
import glob

from dotenv import load_dotenv

Expand All @@ -22,21 +24,30 @@
SCICAT_INGEST_OWNER_USERNAME = os.getenv("SCICAT_INGEST_OWNER_USERNAME")
SCICAT_INGEST_PASSWORD = os.getenv("SCICAT_INGEST_PASSWORD")
SCICAT_INGEST_SPEC = os.getenv("SCICAT_INGEST_SPEC")
DATASETTRACKER_URL = os.getenv("DATASETTRACKER_URL")
DATASETTRACKER_USERNAME = os.getenv("DATASETTRACKER_USERNAME")
DATASETTRACKER_PASSWORD = os.getenv("DATASETTRACKER_PASSWORD")
DATASETTRACKER_SHARE_IDENTIFIER = os.getenv("DATASETTRACKER_SHARE_IDENTIFIER")

assert type(SCICAT_INGEST_BASE_FOLDER) == str and len(SCICAT_INGEST_BASE_FOLDER) != 0
assert type(SCICAT_INGEST_URL) == str and len(SCICAT_INGEST_URL) != 0
assert type(SCICAT_INGEST_USERNAME) == str and len(SCICAT_INGEST_USERNAME) != 0
assert type(SCICAT_INGEST_PASSWORD) == str and len(SCICAT_INGEST_PASSWORD) != 0
assert type(SCICAT_INGEST_OWNER_USERNAME) == str and len(SCICAT_INGEST_OWNER_USERNAME) != 0
assert type(SCICAT_INGEST_SPEC) == str and len(SCICAT_INGEST_SPEC) != 0

dataset_path = pathlib.Path(SCICAT_INGEST_BASE_FOLDER, SCICAT_INGEST_SUBFOLDER).resolve()
dataset_path = Path(SCICAT_INGEST_BASE_FOLDER, SCICAT_INGEST_SUBFOLDER).resolve()
dataset_files = [p for p in map(Path, glob.iglob(str(dataset_path) + "/**", recursive=True))]

ingest(
ingester_spec=SCICAT_INGEST_SPEC,
dataset_path=dataset_path,
dataset_files=dataset_files,
ingester_spec=SCICAT_INGEST_SPEC,
owner_username=SCICAT_INGEST_OWNER_USERNAME,
base_url=SCICAT_INGEST_URL,
username=SCICAT_INGEST_USERNAME,
password=SCICAT_INGEST_PASSWORD,
scicat_url=SCICAT_INGEST_URL,
scicat_username=SCICAT_INGEST_USERNAME,
scicat_password=SCICAT_INGEST_PASSWORD,
datasettracker_url=DATASETTRACKER_URL,
datasettracker_username=DATASETTRACKER_USERNAME,
datasettracker_password=DATASETTRACKER_PASSWORD,
datasettracker_share_identifier=DATASETTRACKER_SHARE_IDENTIFIER
)
71 changes: 0 additions & 71 deletions src/scicat_beamline/_tests/test_ingest_call.py

This file was deleted.

6 changes: 3 additions & 3 deletions src/scicat_beamline/_tests/test_utils.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import json
import numpy as np

from scicat_beamline.utils import NPArrayEncoder, clean_email, UNKNOWN_EMAIL, build_search_terms, calculate_access_controls, Issue
from scicat_beamline.utils import NPArrayEncoder, clean_email, UNKNOWN_EMAIL, search_terms_from_name, calculate_access_controls


def test_clean_email_valid():
Expand Down Expand Up @@ -48,8 +48,8 @@ def test_np_encoder():
assert json.dumps(encoded_np, allow_nan=False)


def test_build_search_terms():
terms = build_search_terms("Time-is_an illusion. Lunchtime/2x\\so.")
def test_search_terms_from_name():
terms = search_terms_from_name("Time-is_an illusion. Lunchtime/2x\\so.")
assert "time" in terms
assert "is" in terms
assert "an" in terms
Expand Down
Loading