added script to fetch all relevant graph nodes from a given decade#3
added script to fetch all relevant graph nodes from a given decade#3
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds functionality to fetch graph nodes (OpenAlex works) from a specific decade along with their in-decade references. This enables retrieval of citation network data for temporal analysis.
- Added
fetch_per_decade_datamethod toCockroachClientto query works by decade - Added comprehensive test coverage for the new fetching functionality
- Properly uses parameterized queries and SQL identifiers to prevent injection vulnerabilities
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| hvectorspaces/io/cockroach_client.py | Implements fetch_per_decade_data method to query works within a specified decade with configurable additional fields |
| tests/test_crdb_client.py | Adds test_fetch_in_decade_references to validate decade-based data fetching, including verification of in-decade reference relationships |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assert ( | ||
| set(referenced_works) | ||
| .intersection(set(in_decade_references)) | ||
| .issubset(oa_ids) | ||
| ) |
There was a problem hiding this comment.
This assertion is redundant and potentially confusing. By definition, if in_decade_references is a subset of oa_ids (as checked on line 79), then the intersection of any set with in_decade_references will also be a subset of oa_ids. This assertion doesn't add meaningful test coverage and could be removed to simplify the test logic.
| assert ( | |
| set(referenced_works) | |
| .intersection(set(in_decade_references)) | |
| .issubset(oa_ids) | |
| ) | |
| # The following assertion is redundant and has been removed. |
There was a problem hiding this comment.
@copilot it was a mistake. I changed it to the actual test I wanted (that is, in_decade_references contains all the referenced works that are in oa_ids).
assert (
set(referenced_works)
.intersection(oa_ids)
.issubset(set(in_decade_references))
)
What do you think?
No description provided.