Conversation
Reviewer's Guide by SourceryThis PR includes several documentation improvements, dependency updates, and adds a new Qooper scraper application. The main changes involve fixing formatting and links in the LLM POC README, adjusting Docker configuration formatting, downgrading chromadb version, and introducing a new scraping tool for the Qooper platform. Class diagram for Qooper scraper applicationclassDiagram
class Endpoints {
+String Programs
+String Groups
+String GroupsJoined
+String Members
+String GroupDiscussions
+String GroupEventsUpcoming
+String GroupEventsPast
}
class Group {
+int id
+String name
+String description
+String image_url
+String kind
+List~String~ tags
+bool is_joined
+int members_count
+int resources_count
+int discussions_count
+int past_events_count
+int upcoming_events_count
+String created_at
}
class GroupList {
+List~Group~ root
}
class GroupMember {
+int id
+String first_name
+String last_name
+String image_url
+String current_position
+String current_organization
+List~String~ roles
}
class GroupMemberList {
+List~GroupMember~ root
}
class GroupEvent {
+int id
+String title
+String timezone
+String start_time
+String end_time
+String image_url
+String updated_at
+String created_at
+String address
+GroupMember publisher
}
class GroupEventList {
+List~GroupEvent~ root
}
class Discussion {
+int id
+int comments_count
+int upvotes_count
+int attachments_count
+String title
+String description
+String created_at
+GroupMember publisher
+List~String~ tags
+List~int~ upvotes_user_ids
+List~int~ comments
+bool is_comment
}
class DiscussionList {
+List~Discussion~ root
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey @TobeTek - I've reviewed your changes and found some issues that need to be addressed.
Blocking issues:
- Avoid storing authentication tokens in plaintext files (link)
Overall Comments:
- Can you explain the rationale for downgrading chromadb from 0.5.20 to 0.3.26? This is quite a significant version change.
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🔴 Security: 1 blocking issue
- 🟢 Testing: all looks good
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| with open("./qooper-token.txt") as f: | ||
| qooper_token = f.read().strip() | ||
|
|
||
| session = requests.Session() |
There was a problem hiding this comment.
suggestion: Add error handling for API requests
Consider adding try/except blocks and retries for API requests to handle temporary failures gracefully.
session = requests.Session()
retries = Retry(total=3, backoff_factor=0.5, status_forcelist=[500, 502, 503, 504])
session.mount('https://', HTTPAdapter(max_retries=retries))| auth_token = driver.execute_script("return localStorage.getItem('qooper_atfu');") | ||
| auth_token = json.loads(auth_token) | ||
| print('Auth token: ', f"{auth_token[:4]}...{auth_token[-4:]}") | ||
| with open('qooper-token.txt', 'w') as f: |
There was a problem hiding this comment.
🚨 issue (security): Avoid storing authentication tokens in plaintext files
Consider using environment variables or a secure credential store instead of writing tokens to disk.
|
|
||
| # For group, get all discussions | ||
| # NOTE: Only gets messages from groups the token grants access to | ||
| all_discussions = [] |
There was a problem hiding this comment.
issue (complexity): Consider extracting discussion fetching logic into a dedicated function to improve code organization.
The discussion/comment fetching logic can be simplified by extracting it into a focused function that handles the batching more cleanly. Here's a suggested refactor:
def fetch_group_discussions(session, group_id):
"""Fetch all discussions and their comments for a group."""
discussions = session.get(
Endpoints.GroupDiscussions.format(group_id=group_id)
).json()["data"]
result = []
for discussion in discussions:
# Fetch full discussion if it has 3+ comments
if len(discussion["comments"]) >= 3:
discussion = session.get(
Endpoints.GroupDiscussions.format(
group_id=group_id + f'/{discussion["id"]}'
)
).json()["data"]
# Add discussion with processed comments
result.append({
**discussion,
"comments": [
{**comment, "discussion_id": discussion["id"]}
for comment in discussion["comments"]
]
})
return result
# Main script usage
all_discussions = []
for group in groups_joined:
group_discussions = fetch_group_discussions(session, group["id"])
all_discussions.extend(group_discussions)This approach:
- Separates fetching logic into a focused function
- Maintains consistent data structures without mixing comment IDs and full objects
- Makes the relationship between discussions and comments clearer
- Reduces nesting depth and improves readability
| def login_to_qooper(driver: webdriver.Chrome): | ||
| driver.get(QOOPER_HOME_PAGE) | ||
|
|
||
| input_email = wait_for_element_visible( |
There was a problem hiding this comment.
issue (code-quality): Extract duplicate code into function (extract-duplicate-method)
|
|
||
| all_groups = [] | ||
| for group in raw_groups: | ||
| group = session.get(Endpoints.Groups + f'/{group["id"]}').json()["data"] |
There was a problem hiding this comment.
suggestion (code-quality): Use f-string instead of string concatenation (use-fstring-for-concatenation)
| group = session.get(Endpoints.Groups + f'/{group["id"]}').json()["data"] | |
| group = session.get(f'{Endpoints.Groups}/{group["id"]}').json()["data"] |
Summary by Sourcery
Introduce a new Qooper scraper module with scripts for data retrieval and token automation, and improve documentation and formatting in the llm-poc-variant-01 project.
New Features:
Enhancements:
Build:
Documentation: