diff --git a/.github/workflows/add_new_mentors.yml b/.github/workflows/add_new_mentors.yml new file mode 100644 index 00000000..640ea23d --- /dev/null +++ b/.github/workflows/add_new_mentors.yml @@ -0,0 +1,76 @@ +name: Add Newly Accepted Mentors + +on: + workflow_dispatch: + inputs: + file_id: + description: "Google Drive file ID of the Excel sheet with the new mentors data" + required: true + current_period: + description: "Current period (specify \"long-term\" if during long-term mentorship registration; otherwise, default is used)" + required: false + default: 'default' + +jobs: + add-new-mentors: + runs-on: ubuntu-latest + + steps: + - name: Checkout repository + uses: actions/checkout@v5 + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: '3.12' + + - name: Cache pip + uses: actions/cache@v4 + with: + path: ~/.cache/pip + key: ${{ runner.os }}-pip-${{ hashFiles('tools/requirements.txt') }} + restore-keys: | + ${{ runner.os }}-pip- + + - name: Install dependencies + run: | + python -m pip install --upgrade pip + pip install -r tools/requirements.txt + + - name: Install and Configure rclone with Google Cloud service account + run: | + curl https://rclone.org/install.sh | sudo bash + echo '${{ secrets.GOOGLECLOUD_SERVICE_KEY_RETRIEVE_ADHOC_FILE_JSON }}' > service_account.json + rclone config create gdrive drive scope=drive service_account_file=service_account.json + + - name: Download spreadsheet from Google Drive + run: | + rclone backend copyid gdrive: ${{ github.event.inputs.file_id }} tools/samples/new_mentors.xlsx + + - name: Run scripts to append new mentors to mentors.yml and download their profile picture(s) + run: | + cd tools + python3 automation_mentors.py samples/new_mentors.xlsx ../_data/mentors.yml ${{ github.event.inputs.current_period }} a 0 + python3 download_image.py samples/new_mentors.xlsx + + - name: Cleanup files + if: always() + run: rm -f service_account.json tools/samples/new_mentors.xlsx + + - name: Create or Update Pull Request + uses: peter-evans/create-pull-request@v7 + with: + token: ${{ secrets.GHA_ACTIONS_ALLOW_TOKEN }} + commit-message: "added new mentors" + branch: "automation/add-new-mentors" + team-reviewers: | + Women-Coding-Community/leaders + title: "[WCC Bot] Add New Mentors" + body: | + This PR was created automatically by a GitHub Action that handles adding new mentors. + `_data/mentors.yml` should be updated with the new data. Images for the new mentors should also have been downloaded and included in this PR. + + Please review the changes and ensure that the changes are as expected before merging. + labels: | + automation + new-mentors diff --git a/tools/README.md b/tools/README.md index c3d83541..5961bbf8 100644 --- a/tools/README.md +++ b/tools/README.md @@ -1,9 +1,8 @@ ## How to Run Python Scripts -There are two automation scripts: -1) `automation.py`: appends new mentors in `samples/mentors.xslx` to `_data/mentor.yml` +1) `automation_mentors.py`: appends new mentors in `samples/mentors.xslx` to `_data/mentor.yml` (or updates all existing mentors if using WRITE mode) -2) `download_image.py`: downloads image from a specified URL and saves in `assets/images/mentors` +2) `download_image.py`: downloads image for each mentor from a specified URL and saves in `assets/images/mentors`. It uses data from `samples/mentors.xlsx` sheetname `Mentors Images`. 3) `meetup_import.py`: imports new upcoming events from the WCC MeetUp page using the iCal feed: https://www.meetup.com/women-coding-community/events/ical/ @@ -17,21 +16,32 @@ python 3.11 or above ### How to Execute on Mac -#### A) `automation.py` +#### A) `automation_mentors.py` ```shell -sh run_automation.sh +sh run_mentor_automation.sh ``` -**Note:** -- Ensure to update `mentors.xslx` with the new spreadsheet containing the mentors to be added, **OR** -- adjust the `FILE_PATH_MENTORS_XLSX` parameter in [the script](run_automation.sh) to match the file path for the new spreadsheet. +**Notes:** +If running locally: +- Ensure to update `mentors.xslx` sheetname: `WCC All Approved Mentors` with new data containing the mentors to be added, **OR** +- If using another file source, adjust the `FILE_PATH_MENTORS_XLSX` parameter in [the script](run_mentor_automation.sh) to match the file path. +- If running this script during long-term registration period, adjust the `CURRENT_PERIOD` parameter in [the script](run_mentor_automation.sh) to "long-term" + +- After running the script, you **HAVE** to run the [run_download_automation script](run_download_automation.sh) to download images for the new mentors. Else, the image links will be broken as they do not exist yet. Read the instructions for the download script usage below. + +**If using GitHub Actions**, the GHA workflow is **ONLY** for adding new mentors. +It uses a Google Cloud service account setup to retrieve the Excel file from Google Drive. The service key has been configured for womencodingcommunity Google Drive account and the file to be used/updated has been shared with the service account email. + Hence, to run the GHA workflow, you only need to provide: + - the file ID for the excel sheet to use + - (Optional) the current period + +For more information on the GC service account configurations, you can read the [documentation](blog_automation/README.md) in the blog automation folder. #### B) `download_image.py` -**Before running the script, make sure** to update the `IMAGE_URL` and `MENTOR_NAME` parameters in the [run_download_automation script](run_download_automation.sh) with: -- the URL you want to download the mentor's image from, **AND** -- the mentor's name as it appears in the spreadsheet e.g 'Adriana Zencke' +**Before running the script, make sure** to update `mentors.xslx` sheetname: `Mentors Images` with the data for the new mentors that you want to download their images +If you want to use another file source, adjust `XLSX_FILE_PATH` parameter in the [script](run_download_automation.sh) to match the file path. You can then run: ```shell diff --git a/tools/automation.py b/tools/automation_mentors.py similarity index 80% rename from tools/automation.py rename to tools/automation_mentors.py index 63dd549a..b2345d2c 100644 --- a/tools/automation.py +++ b/tools/automation_mentors.py @@ -9,6 +9,7 @@ import textwrap from enum import Enum +import numpy as np import pandas as pd from ruamel.yaml import YAML from ruamel.yaml.scalarstring import LiteralScalarString @@ -37,6 +38,10 @@ IMAGE_FILE_PATH = "assets/images/mentors" IMAGE_SUFFIX = ".jpeg" +# Mentorship cycle periods +LONG_TERM_REG_PERIOD = "long-term" # long-term registrations period only +DEFAULT_PERIOD = "default" # rest of the cycle, ad-hoc periods + class WriteMode(Enum): # Create new a file @@ -130,23 +135,40 @@ def get_multiline_string(long_text_arg): multiline_str = LiteralScalarString(textwrap.dedent(long_text_arg)) return multiline_str -def get_sort(mentorship_type, num_mentee): +def is_available_for_long_term(mentorship_type): + return mentorship_type == type_long_term[0] or mentorship_type == TYPE_BOTH + +def is_available_for_ad_hoc(mentorship_type): + return mentorship_type == type_ad_hoc[0] or mentorship_type == TYPE_BOTH + +def sort_for_long_term_reg(num_mentee): + """ + Return sort value for mentors available for long-term, based on number of mentees they can take. + Applies only during long-term registration period. + if no mentees, sort to 10; if num_mentees is 1, sort to 100; 2, sort to 200; if >2, sort to highest 500 + """ + + mentee_sort_map = { + 0: 10, + 1: 100, + 2: 200 + } + return mentee_sort_map.get(num_mentee, 500) + + +def get_sort(mentorship_type, current_period, num_mentee): """ - Get mentor's sort value + Get sort value for a new mentor + Rules: https://docs.google.com/document/d/1GwlleBNScHCQ3K8rgvYIB3upIr1BylgWjGR2jxwYWtI/edit?usp=sharing """ - if mentorship_type == TYPE_BOTH or mentorship_type == type_long_term[0]: - if num_mentee > 2: - return 600 - if num_mentee == 2: - return 550 - if num_mentee == 1: - return 500 - return 200 - if mentorship_type == type_ad_hoc[0]: - #todo: (if availability == next month) then adjust the sort value: - return 100 + if current_period == LONG_TERM_REG_PERIOD and is_available_for_long_term(mentorship_type): + return sort_for_long_term_reg(num_mentee) + + if current_period == DEFAULT_PERIOD and is_available_for_ad_hoc(mentorship_type): + return 500 + # else the mentor is not available for any periods return 10 def get_mentorship_type(mentorship_type_str): @@ -237,13 +259,31 @@ def read_yml_file(file_path): return yml_dict +def get_num_mentee_from_row(mentor_row): + """ + Gets the 'num_mentee' value for a new mentor from mentor_row, or use a default value if invalid. + """ + val = mentor_row.iloc[44] + + return int(val) if pd.notna(val) else 0 + +def get_mentor_position(mentor_row): + """ + Returns formatted value for mentor role and company + """ + if not pd.isna(mentor_row.iloc[9]): + return f"{mentor_row.iloc[8].strip()}, {mentor_row.iloc[9].strip()}" + else: + return mentor_row.iloc[8].strip() + def xlsx_to_yaml_parser(mentor_row, mentor_index, + current_period, mentor_disabled=False, mentor_sort=0, mentor_matched=False, - num_mentee=1): + num_mentee=0): """ Prepare mentor's excel data for yaml format """ @@ -251,20 +291,16 @@ def xlsx_to_yaml_parser(mentor_row, focus = get_yaml_block_sequence(mentor_row, FOCUS_START_INDEX, FOCUS_END_INDEX) programming_languages = get_yaml_block_sequence(mentor_row, PROG_LANG_START_INDEX, PROG_LANG_END_INDEX) - # Left commented since the code might be used in the later versions - # to add default picture until the mentor's image is not available - # mentor_image = os.path.join(IMAGE_FILE_PATH, str(mentor_index) + IMAGE_SUFFIX) - mentor_image = f"{IMAGE_FILE_PATH}/{mentor_row.iloc[2].strip().lower().replace(' ', '_')}{IMAGE_SUFFIX} # TODO: Run download_image script to actually download the image" + mentor_image = f"{IMAGE_FILE_PATH}/{mentor_row.iloc[2].strip().lower().replace(' ', '_')}{IMAGE_SUFFIX}" + # Format mentor role and company + mentor_position = get_mentor_position(mentor_row) mentor_type = get_mentorship_type(mentor_row.iloc[4]) + # If mentor is new i.e mentor_sort is 0 (from default input), get the correct num_mentees and sort values if mentor_sort == 0: - mentor_sort = get_sort(mentor_type, num_mentee) - - if not pd.isna(mentor_row.iloc[9]): - mentor_position = f"{mentor_row.iloc[8].strip()}, {mentor_row.iloc[9].strip()}" - else: - mentor_position = mentor_row.iloc[8].strip() + num_mentee = get_num_mentee_from_row(mentor_row) + mentor_sort = get_sort(mentor_type, current_period, num_mentee) mentor = { 'name': mentor_row.iloc[2].strip(), @@ -329,7 +365,7 @@ def get_yml_data(yml_file_path): return df_yml_data -def get_all_mentors_in_yml_format(yml_file_path, xlsx_file_path, skip_rows=0): +def get_all_mentors_in_yml_format(yml_file_path, xlsx_file_path, current_period, skip_rows=0): """ Read all mentors from Excel sheet: - if mentor is in current mentors.yml, use existing values for index, disabled, sort, matched and num_mentee. @@ -356,6 +392,7 @@ def get_all_mentors_in_yml_format(yml_file_path, xlsx_file_path, skip_rows=0): if not df_yml_row.empty: mentor = xlsx_to_yaml_parser(df_mentors.iloc[row], df_yml_row['Index'].item(), + current_period, df_yml_row['Disabled'].item(), df_yml_row['Sort'].item(), df_yml_row['Matched'].item(), @@ -363,7 +400,8 @@ def get_all_mentors_in_yml_format(yml_file_path, xlsx_file_path, skip_rows=0): logging.info(f"For {mentor_name} use index, disabled and sort from mentors.yml file") else: mentor = xlsx_to_yaml_parser(df_mentors.iloc[row], - new_index) + new_index, + current_period) new_index += 1 mentors.append(mentor) @@ -372,7 +410,7 @@ def get_all_mentors_in_yml_format(yml_file_path, xlsx_file_path, skip_rows=0): return mentors -def get_new_mentors_in_yml_format(yml_file_path, xlsx_file_path, skip_rows=1): +def get_new_mentors_in_yml_format(yml_file_path, xlsx_file_path, current_period, skip_rows=1): """ Read just new mentors from Excel sheet: - start reading xlsx Mentors from the row 1 (from the date 03/04/2024) @@ -397,7 +435,7 @@ def get_new_mentors_in_yml_format(yml_file_path, xlsx_file_path, skip_rows=1): mentor_name = df_mentors.iloc[row].values[2].strip().lower() if df_yml.loc[df_yml.Name == mentor_name].empty: - mentor = xlsx_to_yaml_parser(df_mentors.iloc[row], new_index) + mentor = xlsx_to_yaml_parser(df_mentors.iloc[row], new_index, current_period) new_index += 1 mentors.append(mentor) @@ -411,25 +449,27 @@ def get_new_mentors_in_yml_format(yml_file_path, xlsx_file_path, skip_rows=1): def run_automation(): logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') - if len(sys.argv) == 5: + if len(sys.argv) == 6: xlsx_file_path = sys.argv[1] yml_file_path = sys.argv[2] - mode = WriteMode(sys.argv[3]) - skip_rows = int(sys.argv[4]) + current_period = sys.argv[3] + mode = WriteMode(sys.argv[4]) + skip_rows = int(sys.argv[5]) - logging.info("Params: xlsx: %s yml: %s mode: %s skip_rows: %s", xlsx_file_path, yml_file_path, mode, skip_rows) + logging.info("Params: xlsx: %s yml: %s current_period: %s mode: %s skip_rows: %s", xlsx_file_path, yml_file_path, current_period, mode, skip_rows) else: xlsx_file_path = "samples/mentors.xlsx" yml_file_path = "samples/mentors.yml" + current_period = "default" mode = WriteMode.APPEND skip_rows = 0 - logging.info("Default values: xlsx: %s yml:: %s mode: %s", xlsx_file_path, yml_file_path, mode) + logging.info("Default values: xlsx: %s yml:: %s current_period: %s mode: %s", xlsx_file_path, yml_file_path, current_period, mode) if mode == WriteMode.APPEND: logging.info("Appending option selected.") - list_of_mentors = get_new_mentors_in_yml_format(yml_file_path, xlsx_file_path, skip_rows=skip_rows) + list_of_mentors = get_new_mentors_in_yml_format(yml_file_path, xlsx_file_path, current_period, skip_rows=skip_rows) logging.info("New Mentors size: %d", len(list_of_mentors)) @@ -439,7 +479,7 @@ def run_automation(): elif mode == WriteMode.WRITE: logging.info("Recreate yml - Write option selected.") - list_of_mentors = get_all_mentors_in_yml_format(yml_file_path, xlsx_file_path, skip_rows=skip_rows) + list_of_mentors = get_all_mentors_in_yml_format(yml_file_path, xlsx_file_path, current_period, skip_rows=skip_rows) write_yml_file(yml_file_path, list_of_mentors, WriteMode.WRITE) diff --git a/tools/download_image.py b/tools/download_image.py index 71b2374a..b2fb718d 100644 --- a/tools/download_image.py +++ b/tools/download_image.py @@ -2,10 +2,12 @@ import sys import requests import logging +import pandas as pd logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') IMAGE_FILE_PATH='../assets/images/mentors' +SHEET_NAME = "Mentors Images" def download_image(url, mentor_name): """ @@ -17,31 +19,47 @@ def download_image(url, mentor_name): os.makedirs(IMAGE_FILE_PATH, exist_ok=True) - response = requests.get(url, stream=True) + response = requests.get(url, stream=True, timeout=10) response.raise_for_status() with open(image_path, 'wb') as out_file: out_file.write(response.content) - logging.info(f"Image for {mentor_name} downloaded successfully to {image_path}") return image_path except requests.exceptions.RequestException as e: - logging.error(f"Failed to download image from {url}: {e}") + logging.error(f"Failed to download image for {mentor_name}: {e}") return None def run_automation(): - if len(sys.argv) == 3: - url = sys.argv[1] - mentor_name = sys.argv[2] - image_path = download_image(url, mentor_name) - if image_path: - print(f"Image saved to {image_path}") - else: - print("Failed to download the image.") + if len(sys.argv) == 2: + xlsx_file_path = sys.argv[1] + success_count = 0 + + try: + df_mentors = pd.read_excel(xlsx_file_path, sheet_name=SHEET_NAME) + df_mentors.columns = [col.strip() for col in df_mentors.columns] + except Exception as e: + logging.error(f"Failed to read Excel file {xlsx_file_path}: {e}") + return + + for _, row in df_mentors.iterrows(): + mentor_name = str(row["Mentor Name"]).strip() + url = str(row["Image Download URL"]).strip() + + if pd.isna(mentor_name) or pd.isna(url) or url == "": + logging.warning(f"Skipping download for row with missing data: {row} \n This needs to be fixed manually") + continue + + image_path = download_image(url, mentor_name) + if image_path: + success_count += 1 + + logging.info(f"Successfully downloaded {success_count} images.") + logging.info("Image download process completed.") else: - logging.info(f"Add parameters for download") + logging.info(f"Script needs 1 parameter (xlsx_file_path) to run") if __name__ == "__main__": diff --git a/tools/run_download_automation.sh b/tools/run_download_automation.sh index 36502860..a1849e5d 100644 --- a/tools/run_download_automation.sh +++ b/tools/run_download_automation.sh @@ -7,5 +7,6 @@ source myenv/bin/activate # Install packages pip install -r requirements.txt -# Enter the parameters: IMAGE_URL MENTOR_NAME -python3 download_image.py "https://media.licdn.com/dms/image/v2/D4E03AQFLzC76FGXhiQ/profile-displayphoto-shrink_400_400/profile-displayphoto-shrink_400_400/0/1711114395505?e=1729728000&v=beta&t=P3FN1bSt0aMtt42YyJfiZCRxSqOPllf8U7O9jr2Ki_U" "Samuela Smolorz" \ No newline at end of file +# Enter the parameters: XLSX_FILE_PATH +# Example: samples/mentors.xlsx (should contain two sheets: "WCC All Approved Mentors" and "Mentors Images") +python3 download_image.py samples/mentors.xlsx \ No newline at end of file diff --git a/tools/run_automation.bat b/tools/run_mentor_automation.bat similarity index 62% rename from tools/run_automation.bat rename to tools/run_mentor_automation.bat index 586bbd30..ec5d247a 100644 --- a/tools/run_automation.bat +++ b/tools/run_mentor_automation.bat @@ -10,10 +10,11 @@ echo Installing dependencies... pip install -r requirements.txt setlocal -echo Enter arguments for Python script: FILE_PATH_MENTORS_XLSX FILE_PATH_MENTORS_YML MODE SKIP_ROWS -echo Example: mentors_test.xlsx mentors_test.yml a 1 +echo Enter arguments for Python script: FILE_PATH_MENTORS_XLSX FILE_PATH_MENTORS_YML CURRENT_PERIOD MODE SKIP_ROWS +echo Example: mentors_test.xlsx mentors_test.yml default a 1 +echo CURRENT_PERIOD: use 'default' or 'long-term' ('long-term' if during long-term registration period) echo MODE: a - to append new mentors from the xlsx table to mentors.yml echo MODE: w - to create a new mentors.yml file with all mentors that are in the xlsx table echo SKIP_ROWS: To start XLSX in the line 1 -python automation.py samples/mentors.xlsx samples/mentors.yml a 1 +python automation_mentors.py samples/mentors.xlsx samples/mentors.yml default a 1 @echo on \ No newline at end of file diff --git a/tools/run_automation.sh b/tools/run_mentor_automation.sh similarity index 58% rename from tools/run_automation.sh rename to tools/run_mentor_automation.sh index baf613e1..50785640 100644 --- a/tools/run_automation.sh +++ b/tools/run_mentor_automation.sh @@ -7,8 +7,9 @@ source myenv/bin/activate # Install packages pip install -r requirements.txt -# Enter the parameters: FILE_PATH_MENTORS_XLSX FILE_PATH_MENTORS_YML MODE SKIP_ROWS -# Example: samples/mentors.xlsx samples/mentors.yml a +# Enter the parameters: FILE_PATH_MENTORS_XLSX FILE_PATH_MENTORS_YML CURRENT_PERIOD MODE SKIP_ROWS +# Example: samples/mentors.xlsx samples/mentors.yml default a 0 # mode "a" for APPEND new mentors from the xlsx table to the existing mentors.yml # mode "w" for WRITE all mentors from the xlsx table to mentors.yml -python3 automation.py samples/mentors.xlsx ../_data/mentors.yml a 1 \ No newline at end of file +# CURRENT_PERIOD: use "default" or "long-term" ("long-term" if during long-term registration period) +python3 automation_mentors.py samples/mentors.xlsx ../_data/mentors.yml default a 0 \ No newline at end of file diff --git a/tools/samples/mentors.xlsx b/tools/samples/mentors.xlsx index 87917e99..8fc50455 100644 Binary files a/tools/samples/mentors.xlsx and b/tools/samples/mentors.xlsx differ diff --git a/tools/tests/automation_functional_test.py b/tools/tests/automation_functional_test.py index 12bc585b..f81f5192 100644 --- a/tools/tests/automation_functional_test.py +++ b/tools/tests/automation_functional_test.py @@ -3,25 +3,27 @@ import sys import pytest from file_utils import TOOLS_PATH -from automation import run_automation, read_yml_file, WriteMode +from automation_mentors import run_automation, read_yml_file, WriteMode MENTOR_2 = "Mentor2 Name" MENTOR_3 = "Mentor3 Name" +MENTOR_4 = "Mentor4 Name" def test_write_mentors_skip_zero_rows(monkeypatch): with tempfile.NamedTemporaryFile(suffix='yml', delete=False) as tmpfile: tmp_filename = tmpfile.name - test_args = ['automation.py', os.path.join(TOOLS_PATH, "samples", "mentors.xlsx"), tmp_filename, WriteMode.WRITE, '0'] + test_args = ['automation_mentors.py', os.path.join(TOOLS_PATH, "samples", "mentors.xlsx"), tmp_filename, "default", WriteMode.WRITE, '0'] monkeypatch.setattr(sys, 'argv', test_args) run_automation() result = read_yml_file(tmp_filename) - assert len(result) == 2, f"Expected to write 2 mentors but added {len(result)}" + assert len(result) == 3, f"Expected to write 3 mentors but added {len(result)}" assert MENTOR_2 == result[0]['name'], f"Expected content to be {MENTOR_2} but got '{result[0]['name']}'" - assert MENTOR_3 == result[1]['name'], f"Expected content to be {MENTOR_3} but got '{result[0]['name']}'" + assert MENTOR_3 == result[1]['name'], f"Expected content to be {MENTOR_3} but got '{result[1]['name']}'" + assert MENTOR_4 == result[2]['name'], f"Expected content to be {MENTOR_4} but got '{result[2]['name']}'" # Clean up the temporary file os.remove(tmp_filename) diff --git a/tools/tests/mentorship_type_test.py b/tools/tests/mentorship_type_test.py index 7aa84c49..ef6a8ac8 100644 --- a/tools/tests/mentorship_type_test.py +++ b/tools/tests/mentorship_type_test.py @@ -1,5 +1,5 @@ import unittest -from automation import get_mentorship_type, type_ad_hoc, type_long_term, TYPE_BOTH +from automation_mentors import get_mentorship_type, type_ad_hoc, type_long_term, TYPE_BOTH class TestMentorAutomation(unittest.TestCase): AD_HOC_1 = "Ad-Hoc Format"