Fix llm 01 patch by TobeTek · Pull Request #123 · neomatrix369/learning-path-index

TobeTek · 2024-12-02T12:58:29Z

Summary by Sourcery

Introduce a new Qooper scraper module with scripts for data retrieval and token automation, and improve documentation and formatting in the llm-poc-variant-01 project.

New Features:

Introduce a new Qooper scraper module to interact with the Qooper API, including scripts for scraping group data, discussions, and events.
Add a script to automate the retrieval of authentication tokens using Selenium for the Qooper scraper.

Enhancements:

Correct formatting issues in the README.md of the llm-poc-variant-01 project, improving readability and consistency.

Build:

Update the chromadb dependency version in the llm-poc-variant-01 requirements.txt file.

Documentation:

Add a README.md for the new Qooper scraper module, detailing setup instructions and usage.

sourcery-ai · 2024-12-02T12:58:34Z

Reviewer's Guide by Sourcery

This PR includes several documentation improvements, dependency updates, and adds a new Qooper scraper application. The main changes involve fixing formatting and links in the LLM POC README, adjusting Docker configuration formatting, downgrading chromadb version, and introducing a new scraping tool for the Qooper platform.

Class diagram for Qooper scraper application

classDiagram
    class Endpoints {
        +String Programs
        +String Groups
        +String GroupsJoined
        +String Members
        +String GroupDiscussions
        +String GroupEventsUpcoming
        +String GroupEventsPast
    }

    class Group {
        +int id
        +String name
        +String description
        +String image_url
        +String kind
        +List~String~ tags
        +bool is_joined
        +int members_count
        +int resources_count
        +int discussions_count
        +int past_events_count
        +int upcoming_events_count
        +String created_at
    }

    class GroupList {
        +List~Group~ root
    }

    class GroupMember {
        +int id
        +String first_name
        +String last_name
        +String image_url
        +String current_position
        +String current_organization
        +List~String~ roles
    }

    class GroupMemberList {
        +List~GroupMember~ root
    }

    class GroupEvent {
        +int id
        +String title
        +String timezone
        +String start_time
        +String end_time
        +String image_url
        +String updated_at
        +String created_at
        +String address
        +GroupMember publisher
    }

    class GroupEventList {
        +List~GroupEvent~ root
    }

    class Discussion {
        +int id
        +int comments_count
        +int upvotes_count
        +int attachments_count
        +String title
        +String description
        +String created_at
        +GroupMember publisher
        +List~String~ tags
        +List~int~ upvotes_user_ids
        +List~int~ comments
        +bool is_comment
    }

    class DiscussionList {
        +List~Discussion~ root
    }

File-Level Changes

Change	Details	Files
Documentation improvements and formatting fixes in LLM POC README	Fixed markdown formatting and indentation Corrected broken Ollama download link Added section about running the Chainlit UI with demo GIF Removed unnecessary blank lines	`app/llm-poc-variant-01/README.md`
Docker configuration formatting adjustments	Improved code indentation Reorganized docker run command arguments for better readability	`app/llm-poc-variant-01/docker/run-docker-container.sh`
Dependency version update	Downgraded chromadb from version 0.5.20 to 0.3.26	`app/llm-poc-variant-01/requirements.txt`
Added new Qooper scraper application	Implemented main scraping script with API endpoints and data extraction Added token retrieval script using Selenium for authentication Created Pydantic models for data validation Added documentation and setup instructions Included requirements.txt for dependencies	`app/qooper-scraper/scrape.py` `app/qooper-scraper/get_token.py` `app/qooper-scraper/schemas.py` `app/qooper-scraper/requirements.txt` `app/qooper-scraper/README.md` `app/qooper-scraper/.env.example` `app/qooper-scraper/.gitignore`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time. You can also use
this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @TobeTek - I've reviewed your changes and found some issues that need to be addressed.

Blocking issues:

Avoid storing authentication tokens in plaintext files (link)

Overall Comments:

Can you explain the rationale for downgrading chromadb from 0.5.20 to 0.3.26? This is quite a significant version change.

Here's what I looked at during the review

🟡 General issues: 1 issue found
🔴 Security: 1 blocking issue
🟢 Testing: all looks good
🟡 Complexity: 1 issue found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2024-12-02T12:59:29Z

+with open("./qooper-token.txt") as f:
+    qooper_token = f.read().strip()
+
+session = requests.Session()


suggestion: Add error handling for API requests

Consider adding try/except blocks and retries for API requests to handle temporary failures gracefully.

session = requests.Session() retries = Retry(total=3, backoff_factor=0.5, status_forcelist=[500, 502, 503, 504]) session.mount('https://', HTTPAdapter(max_retries=retries))

sourcery-ai · 2024-12-02T12:59:30Z

+    auth_token = driver.execute_script("return localStorage.getItem('qooper_atfu');")
+    auth_token = json.loads(auth_token)
+    print('Auth token: ', f"{auth_token[:4]}...{auth_token[-4:]}")
+    with open('qooper-token.txt', 'w') as f:


🚨 issue (security): Avoid storing authentication tokens in plaintext files

Consider using environment variables or a secure credential store instead of writing tokens to disk.

sourcery-ai · 2024-12-02T12:59:30Z

+
+# For group, get all discussions
+# NOTE: Only gets messages from groups the token grants access to
+all_discussions = []


issue (complexity): Consider extracting discussion fetching logic into a dedicated function to improve code organization.

The discussion/comment fetching logic can be simplified by extracting it into a focused function that handles the batching more cleanly. Here's a suggested refactor:

def fetch_group_discussions(session, group_id): """Fetch all discussions and their comments for a group.""" discussions = session.get( Endpoints.GroupDiscussions.format(group_id=group_id) ).json()["data"] result = [] for discussion in discussions: # Fetch full discussion if it has 3+ comments if len(discussion["comments"]) >= 3: discussion = session.get( Endpoints.GroupDiscussions.format( group_id=group_id + f'/{discussion["id"]}' ) ).json()["data"] # Add discussion with processed comments result.append({ **discussion, "comments": [ {**comment, "discussion_id": discussion["id"]} for comment in discussion["comments"] ] }) return result # Main script usage all_discussions = [] for group in groups_joined: group_discussions = fetch_group_discussions(session, group["id"]) all_discussions.extend(group_discussions)

This approach:

Separates fetching logic into a focused function

Maintains consistent data structures without mixing comment IDs and full objects

Makes the relationship between discussions and comments clearer

Reduces nesting depth and improves readability

sourcery-ai · 2024-12-02T12:59:30Z

+def login_to_qooper(driver: webdriver.Chrome):
+    driver.get(QOOPER_HOME_PAGE)
+
+    input_email = wait_for_element_visible(


issue (code-quality): Extract duplicate code into function (extract-duplicate-method)

sourcery-ai · 2024-12-02T12:59:30Z

+
+all_groups = []
+for group in raw_groups:
+    group = session.get(Endpoints.Groups + f'/{group["id"]}').json()["data"]


suggestion (code-quality): Use f-string instead of string concatenation (use-fstring-for-concatenation)

Suggested change

group = session.get(Endpoints.Groups + f'/{group["id"]}').json()["data"]

group = session.get(f'{Endpoints.Groups}/{group["id"]}').json()["data"]

TobeTek added 2 commits November 30, 2024 23:44

feat: Implement Qooper web scraper

a80f401

fix: Fix dependency conflicts in LLM PoC-1

a656f86

TobeTek requested a review from neomatrix369 as a code owner December 2, 2024 12:58

sourcery-ai Bot reviewed Dec 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix llm 01 patch#123

Fix llm 01 patch#123
TobeTek wants to merge 2 commits intomainfrom
fix-llm-01-patch

TobeTek commented Dec 2, 2024 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented Dec 2, 2024 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

sourcery-ai Bot Dec 2, 2024

Uh oh!

sourcery-ai Bot Dec 2, 2024

Uh oh!

sourcery-ai Bot Dec 2, 2024

Uh oh!

sourcery-ai Bot Dec 2, 2024

Uh oh!

sourcery-ai Bot Dec 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	group = session.get(Endpoints.Groups + f'/{group["id"]}').json()["data"]
	group = session.get(f'{Endpoints.Groups}/{group["id"]}').json()["data"]

Conversation

TobeTek commented Dec 2, 2024 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide by Sourcery

Class diagram for Qooper scraper application

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

TobeTek commented Dec 2, 2024 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented Dec 2, 2024 •

edited

Loading