Skip to content

Fix llm 01 patch#123

Open
TobeTek wants to merge 2 commits intomainfrom
fix-llm-01-patch
Open

Fix llm 01 patch#123
TobeTek wants to merge 2 commits intomainfrom
fix-llm-01-patch

Conversation

@TobeTek
Copy link
Copy Markdown
Collaborator

@TobeTek TobeTek commented Dec 2, 2024

Summary by Sourcery

Introduce a new Qooper scraper module with scripts for data retrieval and token automation, and improve documentation and formatting in the llm-poc-variant-01 project.

New Features:

  • Introduce a new Qooper scraper module to interact with the Qooper API, including scripts for scraping group data, discussions, and events.
  • Add a script to automate the retrieval of authentication tokens using Selenium for the Qooper scraper.

Enhancements:

  • Correct formatting issues in the README.md of the llm-poc-variant-01 project, improving readability and consistency.

Build:

  • Update the chromadb dependency version in the llm-poc-variant-01 requirements.txt file.

Documentation:

  • Add a README.md for the new Qooper scraper module, detailing setup instructions and usage.

@TobeTek TobeTek requested a review from neomatrix369 as a code owner December 2, 2024 12:58
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Dec 2, 2024

Reviewer's Guide by Sourcery

This PR includes several documentation improvements, dependency updates, and adds a new Qooper scraper application. The main changes involve fixing formatting and links in the LLM POC README, adjusting Docker configuration formatting, downgrading chromadb version, and introducing a new scraping tool for the Qooper platform.

Class diagram for Qooper scraper application

classDiagram
    class Endpoints {
        +String Programs
        +String Groups
        +String GroupsJoined
        +String Members
        +String GroupDiscussions
        +String GroupEventsUpcoming
        +String GroupEventsPast
    }

    class Group {
        +int id
        +String name
        +String description
        +String image_url
        +String kind
        +List~String~ tags
        +bool is_joined
        +int members_count
        +int resources_count
        +int discussions_count
        +int past_events_count
        +int upcoming_events_count
        +String created_at
    }

    class GroupList {
        +List~Group~ root
    }

    class GroupMember {
        +int id
        +String first_name
        +String last_name
        +String image_url
        +String current_position
        +String current_organization
        +List~String~ roles
    }

    class GroupMemberList {
        +List~GroupMember~ root
    }

    class GroupEvent {
        +int id
        +String title
        +String timezone
        +String start_time
        +String end_time
        +String image_url
        +String updated_at
        +String created_at
        +String address
        +GroupMember publisher
    }

    class GroupEventList {
        +List~GroupEvent~ root
    }

    class Discussion {
        +int id
        +int comments_count
        +int upvotes_count
        +int attachments_count
        +String title
        +String description
        +String created_at
        +GroupMember publisher
        +List~String~ tags
        +List~int~ upvotes_user_ids
        +List~int~ comments
        +bool is_comment
    }

    class DiscussionList {
        +List~Discussion~ root
    }
Loading

File-Level Changes

Change Details Files
Documentation improvements and formatting fixes in LLM POC README
  • Fixed markdown formatting and indentation
  • Corrected broken Ollama download link
  • Added section about running the Chainlit UI with demo GIF
  • Removed unnecessary blank lines
app/llm-poc-variant-01/README.md
Docker configuration formatting adjustments
  • Improved code indentation
  • Reorganized docker run command arguments for better readability
app/llm-poc-variant-01/docker/run-docker-container.sh
Dependency version update
  • Downgraded chromadb from version 0.5.20 to 0.3.26
app/llm-poc-variant-01/requirements.txt
Added new Qooper scraper application
  • Implemented main scraping script with API endpoints and data extraction
  • Added token retrieval script using Selenium for authentication
  • Created Pydantic models for data validation
  • Added documentation and setup instructions
  • Included requirements.txt for dependencies
app/qooper-scraper/scrape.py
app/qooper-scraper/get_token.py
app/qooper-scraper/schemas.py
app/qooper-scraper/requirements.txt
app/qooper-scraper/README.md
app/qooper-scraper/.env.example
app/qooper-scraper/.gitignore

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @TobeTek - I've reviewed your changes and found some issues that need to be addressed.

Blocking issues:

  • Avoid storing authentication tokens in plaintext files (link)

Overall Comments:

  • Can you explain the rationale for downgrading chromadb from 0.5.20 to 0.3.26? This is quite a significant version change.
Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🔴 Security: 1 blocking issue
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 1 issue found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

with open("./qooper-token.txt") as f:
qooper_token = f.read().strip()

session = requests.Session()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Add error handling for API requests

Consider adding try/except blocks and retries for API requests to handle temporary failures gracefully.

session = requests.Session()
retries = Retry(total=3, backoff_factor=0.5, status_forcelist=[500, 502, 503, 504])
session.mount('https://', HTTPAdapter(max_retries=retries))

auth_token = driver.execute_script("return localStorage.getItem('qooper_atfu');")
auth_token = json.loads(auth_token)
print('Auth token: ', f"{auth_token[:4]}...{auth_token[-4:]}")
with open('qooper-token.txt', 'w') as f:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 issue (security): Avoid storing authentication tokens in plaintext files

Consider using environment variables or a secure credential store instead of writing tokens to disk.


# For group, get all discussions
# NOTE: Only gets messages from groups the token grants access to
all_discussions = []
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider extracting discussion fetching logic into a dedicated function to improve code organization.

The discussion/comment fetching logic can be simplified by extracting it into a focused function that handles the batching more cleanly. Here's a suggested refactor:

def fetch_group_discussions(session, group_id):
    """Fetch all discussions and their comments for a group."""
    discussions = session.get(
        Endpoints.GroupDiscussions.format(group_id=group_id)
    ).json()["data"]

    result = []
    for discussion in discussions:
        # Fetch full discussion if it has 3+ comments
        if len(discussion["comments"]) >= 3:
            discussion = session.get(
                Endpoints.GroupDiscussions.format(
                    group_id=group_id + f'/{discussion["id"]}'
                )
            ).json()["data"]

        # Add discussion with processed comments
        result.append({
            **discussion,
            "comments": [
                {**comment, "discussion_id": discussion["id"]}
                for comment in discussion["comments"]
            ]
        })

    return result

# Main script usage
all_discussions = []
for group in groups_joined:
    group_discussions = fetch_group_discussions(session, group["id"])
    all_discussions.extend(group_discussions)

This approach:

  1. Separates fetching logic into a focused function
  2. Maintains consistent data structures without mixing comment IDs and full objects
  3. Makes the relationship between discussions and comments clearer
  4. Reduces nesting depth and improves readability

def login_to_qooper(driver: webdriver.Chrome):
driver.get(QOOPER_HOME_PAGE)

input_email = wait_for_element_visible(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Extract duplicate code into function (extract-duplicate-method)


all_groups = []
for group in raw_groups:
group = session.get(Endpoints.Groups + f'/{group["id"]}').json()["data"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Use f-string instead of string concatenation (use-fstring-for-concatenation)

Suggested change
group = session.get(Endpoints.Groups + f'/{group["id"]}').json()["data"]
group = session.get(f'{Endpoints.Groups}/{group["id"]}').json()["data"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant