Skip to content

fix: Few Missing S3 URIs While Merging Trees#23

Merged
rdheekonda merged 2 commits into
mainfrom
users/raja/fix-artifact-few-missing-object-uris
Apr 24, 2025
Merged

fix: Few Missing S3 URIs While Merging Trees#23
rdheekonda merged 2 commits into
mainfrom
users/raja/fix-artifact-few-missing-object-uris

Conversation

@rdheekonda
Copy link
Copy Markdown
Contributor

@rdheekonda rdheekonda commented Apr 24, 2025

Fix Few Missing S3 URIs While Merging Trees

Key Changes:

  • Fix merging trees logic

Now it's coming correctly:

[
  {
    "type": "dir",
    "dir_path": "/Users/raja/Desktop/dreadnode/data",
    "hash": "7cc6d39f627e1b74",
    "children":
      [
        {
          "type": "file",
          "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/bc742c5207207677",
          "hash": "bc742c5207207677",
          "size_bytes": 8196,
          "final_real_path": "/Users/raja/Desktop/dreadnode/data/.DS_Store",
        },
        {
          "type": "file",
          "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/76063a6c09c92b31.numbers",
          "hash": "76063a6c09c92b31",
          "size_bytes": 481570,
          "final_real_path": "/Users/raja/Desktop/dreadnode/data/text_to_text_modality.csv.numbers",
        },
        {
          "type": "file",
          "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/77708d10a3a8dc20.rtf",
          "hash": "77708d10a3a8dc20",
          "size_bytes": 17962,
          "final_real_path": '/Users/raja/Desktop/dreadnode/data/"sample_logfire_raw_data.json".rtf',
        },
        {
          "type": "file",
          "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/d7f7fbc089b682ce.jpg",
          "hash": "d7f7fbc089b682ce",
          "size_bytes": 154256,
          "final_real_path": "/Users/raja/Desktop/dreadnode/data/good-morning-new-york.jpg",
        },
        {
          "type": "file",
          "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/a55e5ed05e21d118.pkl",
          "hash": "a55e5ed05e21d118",
          "size_bytes": 1101,
          "final_real_path": "/Users/raja/Desktop/dreadnode/data/model.pkl",
        },
        {
          "type": "file",
          "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/1e67d825138c45bf.jpg",
          "hash": "1e67d825138c45bf",
          "size_bytes": 5245329,
          "final_real_path": "/Users/raja/Desktop/dreadnode/data/Snake_River.jpg",
        },
        {
          "type": "dir",
          "dir_path": "/Users/raja/Desktop/dreadnode/data/video",
          "hash": "680dc7a6ee7ed434",
          "children":
            [
              {
                "type": "file",
                "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/729e49981c410f98",
                "hash": "729e49981c410f98",
                "size_bytes": 6148,
                "final_real_path": "/Users/raja/Desktop/dreadnode/data/video/.DS_Store",
              },
              {
                "type": "file",
                "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/573400d5d649e6ce.avi",
                "hash": "573400d5d649e6ce",
                "size_bytes": 2279794,
                "final_real_path": "/Users/raja/Desktop/dreadnode/data/video/file_example_AVI_1920_2_3MG.avi",
              },
              {
                "type": "file",
                "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/6ff2bc6e1edda876.webm",
                "hash": "6ff2bc6e1edda876",
                "size_bytes": 1417829,
                "final_real_path": "/Users/raja/Desktop/dreadnode/data/video/file_example_WEBM_640_1_4MB.webm",
              },
              {
                "type": "file",
                "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/7cff8d8e99358dff.mov",
                "hash": "7cff8d8e99358dff",
                "size_bytes": 2247200,
                "final_real_path": "/Users/raja/Desktop/dreadnode/data/video/file_example_MOV_1920_2_2MB.mov",
              },
              {
                "type": "file",
                "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/1385f0808b7ef160.mp4",
                "hash": "1385f0808b7ef160",
                "size_bytes": 3114374,
                "final_real_path": "/Users/raja/Desktop/dreadnode/data/video/file_example_MP4_640_3MG.mp4",
              },
              {
                "type": "file",
                "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/1c4d2e7cce26714d.wmv",
                "hash": "1c4d2e7cce26714d",
                "size_bytes": 1175629,
                "final_real_path": "/Users/raja/Desktop/dreadnode/data/video/file_example_WMV_480_1_2MB.wmv",
              },
              {
                "type": "file",
                "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/b4c1db2e6cc0c62c.mp3",
                "hash": "b4c1db2e6cc0c62c",
                "size_bytes": 2158877,
                "final_real_path": "/Users/raja/Desktop/dreadnode/data/video/file_example_MP3_2MG.mp3",
              },
            ],
        },
        {
          "type": "dir",
          "dir_path": "/Users/raja/Desktop/dreadnode/data/audio",
          "hash": "3819722ab4c308f8",
          "children":
            [
              {
                "type": "file",
                "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/2f56a18fd3645518",
                "hash": "2f56a18fd3645518",
                "size_bytes": 8196,
                "final_real_path": "/Users/raja/Desktop/dreadnode/data/audio/.DS_Store",
              },
              {
                "type": "dir",
                "dir_path": "/Users/raja/Desktop/dreadnode/data/audio/subaudio",
                "hash": "f6c5cd4750ea0651",
                "children":
                  [
                    {
                      "type": "file",
                      "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/4701cde18ea10f33.wav",
                      "hash": "4701cde18ea10f33",
                      "size_bytes": 2104474,
                      "final_real_path": "/Users/raja/Desktop/dreadnode/data/audio/subaudio/file_example_WAV_2MG.wav",
                    },
                    {
                      "type": "file",
                      "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/b4c1db2e6cc0c62c.mp3",
                      "hash": "b4c1db2e6cc0c62c",
                      "size_bytes": 2158877,
                      "final_real_path": "/Users/raja/Desktop/dreadnode/data/audio/subaudio/file_example_MP3_2MG.mp3",
                    },
                  ],
              },
              {
                "type": "dir",
                "dir_path": "/Users/raja/Desktop/dreadnode/data/audio/copied",
                "hash": "3e50bdfcc0bba5b5",
                "children":
                  [
                    {
                      "type": "file",
                      "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/0a0c0ecea6eeecde",
                      "hash": "0a0c0ecea6eeecde",
                      "size_bytes": 6148,
                      "final_real_path": "/Users/raja/Desktop/dreadnode/data/audio/copied/.DS_Store",
                    },
                    {
                      "type": "dir",
                      "dir_path": "/Users/raja/Desktop/dreadnode/data/audio/copied/subaudio",
                      "hash": "f6c5cd4750ea0651",
                      "children":
                        [
                          {
                            "type": "file",
                            "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/4701cde18ea10f33.wav",
                            "hash": "4701cde18ea10f33",
                            "size_bytes": 2104474,
                            "final_real_path": "/Users/raja/Desktop/dreadnode/data/audio/copied/subaudio/file_example_WAV_2MG.wav",
                          },
                          {
                            "type": "file",
                            "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/b4c1db2e6cc0c62c.mp3",
                            "hash": "b4c1db2e6cc0c62c",
                            "size_bytes": 2158877,
                            "final_real_path": "/Users/raja/Desktop/dreadnode/data/audio/copied/subaudio/file_example_MP3_2MG.mp3",
                          },
                        ],
                    },
                    {
                      "type": "dir",
                      "dir_path": "/Users/raja/Desktop/dreadnode/data/audio/copied/subaudio2",
                      "hash": "be2e147d8cf3ed05",
                      "children":
                        [
                          {
                            "type": "file",
                            "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/5940a43cb4eda3b4.ogg",
                            "hash": "5940a43cb4eda3b4",
                            "size_bytes": 2097880,
                            "final_real_path": "/Users/raja/Desktop/dreadnode/data/audio/copied/subaudio2/file_example_OOG_2MG.ogg",
                          },
                        ],
                    },
                  ],
              },
              {
                "type": "dir",
                "dir_path": "/Users/raja/Desktop/dreadnode/data/audio/subaudio2",
                "hash": "be2e147d8cf3ed05",
                "children":
                  [
                    {
                      "type": "file",
                      "uri": "s3://user-data/9c528176-bfbc-47fb-874f-851961acc413/artifacts/5940a43cb4eda3b4.ogg",
                      "hash": "5940a43cb4eda3b4",
                      "size_bytes": 2097880,
                      "final_real_path": "/Users/raja/Desktop/dreadnode/data/audio/subaudio2/file_example_OOG_2MG.ogg",
                    },
                  ],
              },
            ],
        },
      ],
  },
]

Generated Summary:

  • Refactored URI propagation logic to ensure consistency across files with identical hashes.
  • Introduced the _propagate_uris_by_hash() method to enforce URI propagation for files sharing the same hash.
  • Modified the merge() method to call _propagate_uris_by_hash() after building the path and hash maps.
  • Simplified hash comparison logic: now uses a single condition to either propagate URI or update the file based on hash equality.
  • Enhanced clarity in handling file merging while ensuring URIs are correctly assigned to files that match hashes.
  • Potential impact includes improved handling of duplicate files in merges, ensuring metadata consistency.

This summary was generated with ❤️ by rigging

@rdheekonda rdheekonda requested a review from monoxgas April 24, 2025 21:09
@rdheekonda rdheekonda changed the title Fix Few Missing S3 URIs While Merging Trees Fix: Few Missing S3 URIs While Merging Trees Apr 24, 2025
@rdheekonda rdheekonda changed the title Fix: Few Missing S3 URIs While Merging Trees fix: Few Missing S3 URIs While Merging Trees Apr 24, 2025
@rdheekonda rdheekonda merged commit c443c14 into main Apr 24, 2025
7 checks passed
@monoxgas monoxgas deleted the users/raja/fix-artifact-few-missing-object-uris branch April 25, 2025 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant