Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = [
"sphinx.ext.napoleon",
"sphinx.ext.napoleon",
"sphinx.ext.intersphinx",
"sphinx.ext.viewcode",
"sphinx.ext.doctest",
Expand All @@ -30,7 +30,7 @@
"sphinx_design",
"sphinxcontrib.autodoc_pydantic",
"sphinx_tabs.tabs",
"sphinx_copybutton",
"sphinx_copybutton",
"enum_tools.autoenum",
]

Expand Down
115 changes: 115 additions & 0 deletions docs/source/local_get_structure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Inspecting Local File Structure

:::{admonition} You Will Learn:
:class: note
- How to inspect the branch structure of local ROOT files with `local_get_structure()`
- How to filter branches, print output, and save results to a text file
- How to get an Awkward Array type representation of the file structure
:::

This page describes `local_get_structure()`, a utility that reads the TTree and branch structure of local ROOT files directly — without going through ServiceX or a container backend.

## Overview

`local_get_structure()` is useful when you want to quickly explore a ROOT file before writing a query. It reads the file structure using `uproot` and formats the output as a readable tree summary showing each TTree, its branches, and their dtypes.

## Basic Usage

Pass one or more file paths and a LocalX `Config` object:

```python
from servicex_local import local_get_structure, xAODConfig

config = xAODConfig(release=25)

result = local_get_structure("path/to/file.root", config)
print(result)
```

The return value is a formatted string. Example output:

```
---------------------------
📁 Sample: path/to/file.root
---------------------------

File Metadata ℹ️ :

No FileMetaData found in dataset.

---------------------------

File structure with branch filter 🌿 '':


🌳 Tree: background
├── Branches:
│ ├── branch1 ; dtype: AsDtype('>f8')
│ ├── branch2 ; dtype: AsDtype('>f8')

🌳 Tree: signal
├── Branches:
│ ├── branch1 ; dtype: AsDtype('>f8')
```

## Dataset Input Formats

`local_get_structure()` accepts the same input formats as `local_deliver()`:

| Input type | Behaviour |
|---|---|
| `str` | Single file path. The path is used as the sample name. |
| `list[str]` | Multiple file paths, each used as its own sample name. |
| `dict` | Maps custom sample names to file paths: `{"my_sample": "path/to/file.root"}`. |

```python
# Multiple files
result = local_get_structure(["file1.root", "file2.root"], config)

# Custom sample names
result = local_get_structure({"signal": "sig.root", "background": "bkg.root"}, config)
```

## Filtering Branches

Use the `filter_branch` keyword argument to show only branches whose names contain a given string:

```python
result = local_get_structure("file.root", config, filter_branch="Electron")
```

Only branches with `"Electron"` in their name will appear in the output.

## Printing Directly

Pass `do_print=True` to print the structure to the terminal instead of returning a string:

```python
local_get_structure("file.root", config, do_print=True)
```

When `do_print=True`, the function returns `None`.

## Saving to a File

Pass `save_to_txt=True` to write the output to `samples_structure.txt` in the current directory:

```python
local_get_structure("file.root", config, save_to_txt=True)
```

The function returns the message `"File structure saved to 'samples_structure.txt'."` when this option is used.

## Getting an Array Type Representation

Pass `array_out=True` to get an Awkward Array type object instead of the formatted string. This returns a dictionary mapping each sample name to an `ak.Array` type that mirrors the TTree structure with correct field names and dtypes:

```python
types = local_get_structure("file.root", config, array_out=True)
```

This is useful for verifying that branch names and types match what a query expects before running through ServiceX.

:::{note}
`array_out=True` and `save_to_txt=True` / `do_print=True` are mutually exclusive. When `array_out=True` is set, the formatting keyword arguments are ignored.
:::
22 changes: 22 additions & 0 deletions samples_structure.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@

---------------------------
📁 Sample: test_file
---------------------------

File Metadata ℹ️ :

No FileMetaData found in dataset.

---------------------------

File structure with branch filter 🌿 '':


🌳 Tree: background
├── Branches:
│ ├── branch1 ; dtype: AsDtype('>f8')
│ ├── branch2 ; dtype: AsDtype('>f8')

🌳 Tree: signal
├── Branches:
│ ├── branch1 ; dtype: AsDtype('>f8')
8 changes: 2 additions & 6 deletions servicex_local/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
from .science_images import WSL2ScienceImage, DockerScienceImage # noqa: F401
from .science_images import SingularityScienceImage # noqa: F401
from .codegen import LocalXAODCodegen, DockerCodegen # noqa: F401
from .adaptor import SXLocalAdaptor # noqa: F401
from .deliver import local_deliver # noqa: F401
from .configurations import xAODConfig # noqa: F401
from .utils import install_sx_local, Platform # noqa: F401
from .configurations import xAODConfig, Platform, Config # noqa: F401
from .utils import local_get_structure # noqa: F401
10 changes: 9 additions & 1 deletion servicex_local/configurations.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,15 @@
import urllib.request
from dataclasses import dataclass
from typing import TYPE_CHECKING, Union
from .utils import Platform
from enum import Enum


class Platform(Enum):
"""Options for which platform to use for the runtime environment."""

docker = "docker"
singularity = "singularity"
wsl2 = "wsl2"


if TYPE_CHECKING:
Expand Down
120 changes: 109 additions & 11 deletions servicex_local/deliver.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import logging
from datetime import datetime
from pathlib import Path
from typing import Any, Generator, List
from typing import Any, Generator, List, Union, Mapping
from deprecated import deprecated

from make_it_sync import make_sync
Expand All @@ -12,15 +12,111 @@
from servicex.models import ResultFormat, TransformRequest, TransformStatus
from servicex.query_core import QueryStringGenerator
from servicex.servicex_client import GuardList
from servicex.yaml_parser import YAML

from servicex_local import SXLocalAdaptor
from servicex_local.adaptor import MinioLocalAdaptor
from .adaptor import SXLocalAdaptor, MinioLocalAdaptor
from .codegen import LocalXAODCodegen
from .configurations import Config, Platform
from servicex_analysis_utils import to_awk

from .configurations import Config
logger = logging.getLogger(__name__)

from servicex_local.utils import install_sx_local
from servicex_local.utils import Platform as _SxPlatform
from servicex_analysis_utils import to_awk

def _load_ServiceXSpec(
config: Union[ServiceXSpec, Mapping[str, Any], str, Path],
) -> ServiceXSpec:
if isinstance(config, Mapping):
logger.debug("Config from dictionary")
config = ServiceXSpec(**config)
elif isinstance(config, ServiceXSpec):
logger.debug("Config from ServiceXSpec")
elif isinstance(config, str) or isinstance(config, Path):
logger.debug("Config from file")

if isinstance(config, str):
file_path = Path(config)
else:
file_path = config

import sys

yaml = YAML()

if sys.version_info < (3, 10):
from importlib_metadata import entry_points
else:
from importlib.metadata import entry_points

plugins = entry_points(group="servicex.query")
for _ in plugins:
yaml.register_class(_.load())
plugins = entry_points(group="servicex.dataset")
for _ in plugins:
yaml.register_class(_.load())

conf = yaml.load(file_path)
config = ServiceXSpec(**conf)
else:
raise TypeError(f"Unknown config type: {type(config)}")

return config


def install_sx_local(
image: str, platform: Platform = Platform.docker, host_port: int = 5001
):
"""Set up a local ServiceX endpoint for data transformation.

Args:
image (str): Image name for the container.
platform (Platform): Which platform to use.
host_port (int): Local host port to expose.

Returns:
Tuple[str, SXLocalAdaptor]: Codegen name, adaptor.
"""
from servicex.configuration import Configuration

try:
sx_cfg = Configuration.read()
cache_dir = Path(sx_cfg.cache_path).resolve()
except NameError:
import tempfile

cache_dir = Path(tempfile.mkdtemp()).resolve()
logging.warning(
"Could not read a ServiceX.yaml. Using temporary directory %s for cache.",
cache_dir,
)

codegen = LocalXAODCodegen()

if platform == Platform.docker:
from .science_images import DockerScienceImage

science_runner = DockerScienceImage(image)

elif platform == Platform.singularity:
from .science_images import SingularityScienceImage

science_runner = SingularityScienceImage(image)

elif platform == Platform.wsl2:
from .science_images import WSL2ScienceImage

container, release = image.split(":")
science_runner = WSL2ScienceImage(container, release)

else:
raise ValueError(f"Unknown platform {platform}")

adaptor = SXLocalAdaptor(
codegen, science_runner, cache_dir, f"http://localhost:{host_port}"
)

logging.info(f"Using local ServiceX endpoint: {codegen}")
logging.info(f"Cache being save to; {adaptor.cache_dir}")
return adaptor


def _sample_run_info(
Expand Down Expand Up @@ -117,7 +213,7 @@ def _save_cache(cache: dict[str, Any], cache_dir: Path) -> None:


async def deliver_async(
spec: ServiceXSpec,
spec: Union[ServiceXSpec, Mapping[str, Any], str, Path],
adaptor: SXLocalAdaptor,
ignore_local_cache: bool = False,
display_progress: bool = True,
Expand All @@ -144,7 +240,9 @@ async def deliver_async(
results: dict[str, GuardList] = {}
cache = _load_cache(adaptor.cache_dir) # Load cache from file system

all_tqs = list(_sample_run_info(spec.General, spec.Sample))
config = _load_ServiceXSpec(spec)

all_tqs = list(_sample_run_info(config.General, config.Sample))
total_files = sum(len(tq.file_list or []) for tq in all_tqs)

with ExpandableProgress(display_progress=display_progress) as progress:
Expand Down Expand Up @@ -218,7 +316,7 @@ async def deliver_async(


def local_deliver(
spec: ServiceXSpec,
spec: Union[ServiceXSpec, Mapping[str, Any], str, Path],
config: Config,
display_progress: bool = True,
):
Expand All @@ -230,7 +328,7 @@ def local_deliver(
image = f"docker://{_DOCKER_IMAGE}:{config.version}"
else:
image = f"{_DOCKER_IMAGE}:{config.version}"
sx_platform = _SxPlatform(config.platform.value)
sx_platform = Platform(config.platform.value)
adaptor = install_sx_local(image, sx_platform)

sx_result = _deliver_sync(
Expand Down
Loading
Loading