Skip to content

added script to fetch all relevant graph nodes from a given decade#3

Merged
FPreta merged 2 commits intomainfrom
feature/add_fetch_perdecade_data
Nov 17, 2025
Merged

added script to fetch all relevant graph nodes from a given decade#3
FPreta merged 2 commits intomainfrom
feature/add_fetch_perdecade_data

Conversation

@FPreta
Copy link
Owner

@FPreta FPreta commented Nov 17, 2025

No description provided.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds functionality to fetch graph nodes (OpenAlex works) from a specific decade along with their in-decade references. This enables retrieval of citation network data for temporal analysis.

  • Added fetch_per_decade_data method to CockroachClient to query works by decade
  • Added comprehensive test coverage for the new fetching functionality
  • Properly uses parameterized queries and SQL identifiers to prevent injection vulnerabilities

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
hvectorspaces/io/cockroach_client.py Implements fetch_per_decade_data method to query works within a specified decade with configurable additional fields
tests/test_crdb_client.py Adds test_fetch_in_decade_references to validate decade-based data fetching, including verification of in-decade reference relationships

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +80 to +84
assert (
set(referenced_works)
.intersection(set(in_decade_references))
.issubset(oa_ids)
)
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is redundant and potentially confusing. By definition, if in_decade_references is a subset of oa_ids (as checked on line 79), then the intersection of any set with in_decade_references will also be a subset of oa_ids. This assertion doesn't add meaningful test coverage and could be removed to simplify the test logic.

Suggested change
assert (
set(referenced_works)
.intersection(set(in_decade_references))
.issubset(oa_ids)
)
# The following assertion is redundant and has been removed.

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot it was a mistake. I changed it to the actual test I wanted (that is, in_decade_references contains all the referenced works that are in oa_ids).

assert (
                set(referenced_works)
                .intersection(oa_ids)
                .issubset(set(in_decade_references))
            )

What do you think?

Copy link
Contributor

Copilot AI commented Nov 17, 2025

@FPreta I've opened a new pull request, #4, to work on those changes. Once the pull request is ready, I'll request review from you.

@FPreta FPreta merged commit df88ede into main Nov 17, 2025
@FPreta FPreta deleted the feature/add_fetch_perdecade_data branch November 17, 2025 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants