Skip to content

Tiled_publish#52

Merged
taxe10 merged 13 commits intomainfrom
tiled_publish
Sep 10, 2025
Merged

Tiled_publish#52
taxe10 merged 13 commits intomainfrom
tiled_publish

Conversation

@dylanmcreynolds
Copy link
Copy Markdown
Member

@dylanmcreynolds dylanmcreynolds commented Jul 27, 2025

Edited
This PR introduces changes to the arroyo pieces of the latent space explorer.

  1. A publisher that saves 2d vector output to a local sqlite database. We wanted to be able to capture this quickly. Should it go into Tiled instead? I can see plusses and minuses. Putting it into our own database allows for more targetted indexing and querying, which doesn't matter right now but...I would really love to also add timing for the two processing steps as columns in the table so we can see this over time. I don't know if that kind of thing would ever be searchable in tiled.
  2. Changes to pyproject.toml to add asycnio unit tests
  3. Unit tests for this new publisher
  4. Slightly unrelated, this PR also includes changes to clientside.js that we needed to get live processing done on the types of links that the tiled websocket creates.

@dylanmcreynolds dylanmcreynolds requested a review from taxe10 August 5, 2025 23:35
@xiaoyachong xiaoyachong mentioned this pull request Aug 11, 2025
Copy link
Copy Markdown
Contributor

@xiaoyachong xiaoyachong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Dylan, thanks for your PR.

The tests run well on my end. I made a small change to the unit test to prevent it from running indefinitely and added code to calculate processing time. I’m not sure if the results need to be saved to Tiled instead of the local .db—a Tiled publisher could be added in the future if needed.

A companion PR is available at als-computing/arroyosas#19, which reads the tiled_url from the local .db file and sends it to LSE.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes to this file to fix a Python unit test that wouldn’t stop running.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add codes for calculating processing time for both autoencoder and dimension reduction.

@dylanmcreynolds
Copy link
Copy Markdown
Member Author

@xiaoyachong , thanks. This feels mergable to me. Do you agree?

@xiaoyachong
Copy link
Copy Markdown
Contributor

@xiaoyachong , thanks. This feels mergable to me. Do you agree?

Let's see if @taxe10 has any comments, and then we can merge it. Tanny and I have scheduled commit day next Monday to merge all the changes made for SMI. We can include this then if that works for you.

@dylanmcreynolds
Copy link
Copy Markdown
Member Author

Great.

@taxe10
Copy link
Copy Markdown
Member

taxe10 commented Aug 18, 2025

Looks great! One thought - should this storage live in Tiled instead of a separate DB? A custom DB gives nice querying options (like by timestamp), but what we really need right now is experiment replay. Are the feature vectors meant to persist across experiments, or just capture the latest results?

Since the offline version already uses Tiled, it might be cleaner to keep things consistent across both modes. My only concern is whether Tiled can handle complex queries as @dylanmcreynolds mentioned before, e.g., timestamps, energy ranges or other metadata, as flexibly as a DB

@dylanmcreynolds
Copy link
Copy Markdown
Member Author

@taxe10 yeah, I've thought about this a lot. I think I've landed on "the two aren't mutually exclusive". While the vectors probably more appropriately belong in Tiled, they don't take much space here. This publisher and a publisher that pushes results to tiled can happily co-exists.

Tiled has pretty good query options, but right now it would take customization of the tiled database to add indexes where appropriate if we wanted to ask complicated questions like "show me all runs where the second dim reduction took over 100 milliseconds".

@xiaoyachong
Copy link
Copy Markdown
Contributor

xiaoyachong commented Aug 21, 2025

@taxe10 @dylanmcreynolds
Thanks for your comments.
I added a Tiled results publisher to store the same results to a local tiled server in my latest commit. The hierarchy is organized in Tiled as follows:

TILED SERVER
|
`-- lse_live_results (root container)
    |
    `-- daily_run_2025-08-20 (daily container, named by current date)
        |
        |-- 12345678-1234-1234-1234-123456789abc (UUID table 1)
        |   `-- DataFrame with feature vectors and metadata
        |
        |-- 87654321-4321-4321-4321-cba987654321 (UUID table 2)
        |   `-- DataFrame with feature vectors and metadata
        |
        |-- feature_vectors (default table for events without UUID)
        |   `-- DataFrame with feature vectors and metadata
        |
        `-- ... (more UUID tables)

Copy link
Copy Markdown
Member Author

@dylanmcreynolds dylanmcreynolds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new tiled publisher is great!! I would like to think of ways to make an abstract class for tiled publishers that does some of this work...especially all the handing off of async/threads that needs to be done to talk to the non-async tiled client. But we can do that refactor in the future.

For now, though, we need to change things to not call tiled client or numpy from with async directly.

Comment thread src/arroyo_reduction/tiled_results_publisher.py Outdated
Comment thread src/arroyo_reduction/tiled_results_publisher.py
Comment thread src/arroyo_reduction/tiled_results_publisher.py
import traceback
logger.error(traceback.format_exc())

async def _write_table_to_tiled(self, table_key):
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be in a thread

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modify the threading as you suggested. Meanwhile, I’d like to mention that I save data to Tiled at the scan level rather than for each individual message, which differs from writing to a local .db. A write occurs only when a new scan (new UUID) is received, with each UUID typically containing 30–200 messages. This simplifies the process since we don’t need to search for the current table and append. Do you think this is a good approach?

Comment thread src/arroyo_reduction/tiled_results_publisher.py
@dylanmcreynolds
Copy link
Copy Markdown
Member Author

Changes look good to me.

@dylanmcreynolds
Copy link
Copy Markdown
Member Author

@xiaoyachong is this mergable?

@taxe10
Copy link
Copy Markdown
Member

taxe10 commented Sep 5, 2025

Overall, the changes look good to me. Quick question - does REMOTE_DATA_TILED_KEY refer to the tiled server used for live mode?
Also, as this repo is growing, we should add more documentation about these different pieces - and perhaps a few of these can belong to another repo in the future

@xiaoyachong
Copy link
Copy Markdown
Contributor

Overall, the changes look good to me. Quick question - does REMOTE_DATA_TILED_KEY refer to the tiled server used for live mode? Also, as this repo is growing, we should add more documentation about these different pieces - and perhaps a few of these can belong to another repo in the future

Yes. This is the key for the remote NSLS-II tiled server.

@xiaoyachong
Copy link
Copy Markdown
Contributor

@xiaoyachong is this mergable?

Hi Dylan. I think so.

@taxe10 taxe10 merged commit 5e3ebcc into main Sep 10, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants