feat(backend): pdf + txt report generator#45

Merged

StarDylan merged 1 commit intomainfrom

report-generation

Feb 4, 2026

Owner

StarDylan commented Feb 4, 2026

No description provided.


          feat(backend): pdf + txt report generator

dbe63c1

StarDylan requested a review from Copilot

February 4, 2026 19:42

StarDylan self-assigned this

Copilot started reviewing on behalf of StarDylan

February 4, 2026 19:42

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

This pull request adds PDF and text report generation functionality for project transcripts. The feature allows users to export formatted transcripts of their interview projects, including speaker utterances and AI analyses, in both plain text and PDF formats.

Changes:

Added reportlab (v4.4.9) and pillow (v12.1.0) dependencies for PDF generation capabilities
Created new standalone script generate_transcript.py with CLI interface for generating transcript reports
Implemented PDF formatting with custom styles, colors, and layouts for professional-looking reports

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 17 comments.

File	Description
backend/uv.lock	Added reportlab and pillow dependencies with all platform-specific wheels
backend/pyproject.toml	Added reportlab>=4.4.9 to project dependencies
backend/src/generate_transcript.py	New script implementing transcript generation logic with PDF and text output

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

backend/src/generate_transcript.py

+                  TableStyle,
+              )
+              from reportlab.lib.enums import TA_CENTER
+              from dateutil.relativedelta import relativedelta

Copilot AI Feb 4, 2026

Missing dependency: The code imports 'dateutil.relativedelta' but 'python-dateutil' is not listed as a dependency in pyproject.toml. While it may be a transitive dependency, it should be explicitly declared since the code directly imports from it.

Copilot uses AI. Check for mistakes.

backend/src/generate_transcript.py

Comment on lines +217 to +285

+                  conn = sqlite3.connect(db_path)
+                  conn.row_factory = sqlite3.Row
+                  cursor = conn.cursor()
+                  # Fetch all transcriptions for the project, ordered by transcription_id (ULID)
+                  # Since ULID is chronologically sortable, sorting by transcription_id gives us time order
+                  query = """
+                      SELECT
+                          transcription_id,
+                          speaker,
+                          text_output,
+                          created_at,
+                          session_id,
+                          user_id
+                      FROM transcriptions
+                      WHERE project_id = ?
+                      ORDER BY transcription_id ASC
+                  """
+                  _ = cursor.execute(query, (project_id,))
+                  rows = cursor.fetchall()
+                  if not rows:
+                      conn.close()
+                      return f"No transcriptions found for project: {project_id}"
+                  # Get project name
+                  _ = cursor.execute("SELECT name FROM project WHERE project_id = ?", (project_id,))
+                  project_row = cursor.fetchone()  # pyright: ignore[reportAny]
+                  project_name: str = (
+                      cast(str, project_row["name"])
+                      if project_row and project_row["name"]
+                      else "Untitled Project"
+                  )
+                  # Fetch AI analyses for the project
+                  ai_analyses_query = """
+                      SELECT
+                          analysis_id,
+                          text,
+                          span,
+                          transcript_context_start,
+                          transcript_context_end,
+                          summary
+                      FROM ai_analyses
+                      WHERE project_id = ?
+                      ORDER BY transcript_context_end ASC
+                  """
+                  _ = cursor.execute(ai_analyses_query, (project_id,))
+                  ai_analyses = cursor.fetchall()
+                  # Build a map of transcription_id -> list of analyses that end at that transcription
+                  analyses_by_end: dict[str, list[dict[str, str]]] = {}
+                  for analysis in ai_analyses:  # pyright: ignore[reportAny]
+                      end_id = cast(str, analysis["transcript_context_end"])
+                      if end_id not in analyses_by_end:
+                          analyses_by_end[end_id] = []
+                      analyses_by_end[end_id].append(
+                          {
+                              "analysis_id": cast(str, analysis["analysis_id"]),
+                              "text": cast(str, analysis["text"]),
+                              "span": cast(str, analysis["span"]) if analysis["span"] else "",
+                              "summary": cast(str, analysis["summary"]),
+                              "start": cast(str, analysis["transcript_context_start"]),
+                              "end": end_id,
+                          }
+                      )
+                  conn.close()

Copilot AI Feb 4, 2026

Resource leak: The database connection is not properly managed. If an exception occurs after conn = sqlite3.connect(db_path) but before conn.close(), the connection will not be closed. Use a context manager (with statement) to ensure proper cleanup: 'with sqlite3.connect(db_path) as conn:' instead of manually calling conn.close().

Copilot uses AI. Check for mistakes.

backend/src/generate_transcript.py

Comment on lines +409 to +413

+                  if output_file:
+                      output_path = Path(output_file)
+                      output_path.parent.mkdir(parents=True, exist_ok=True)
+                      _ = output_path.write_text(transcript, encoding="utf-8")
+                      print(f"Transcript saved to: {output_file}")

Copilot AI Feb 4, 2026

File overwrite without warning: Both the text transcript and PDF files are written without checking if they already exist, and without asking for user confirmation. This could silently overwrite existing files. Consider adding a check or warning when files already exist, or provide a --force flag for the CLI.

Copilot uses AI. Check for mistakes.

backend/src/generate_transcript.py


		args = parser.parse_args()

		project_id_arg = cast(str, args.project_id)

Copilot AI Feb 4, 2026

Missing validation for CLI arguments: The CLI doesn't validate the project_id format (should be a valid ULID). If an invalid project_id is provided, it will fail later with potentially confusing error messages. Consider validating the project_id format early and providing clear error messages.

Suggested change

      
                project_id_arg = cast(str, args.project_id)
          
                project_id_arg = cast(str, args.project_id)
          
                # Validate that the provided project_id is a well-formed ULID before proceeding.
          
                try:
          
                    # This will raise ValueError if the format is invalid.
          
                    _ = ulid.ULID.from_str(project_id_arg.upper())  # pyright: ignore[reportAny]
          
                except ValueError as exc:
          
                    parser.error(f"Invalid project_id '{project_id_arg}': {exc}")

Copilot uses AI. Check for mistakes.

backend/src/generate_transcript.py

Comment on lines +53 to +54

		doc = SimpleDocTemplate(
		pdf_file, pagesize=letter, topMargin=0.75 * inch, bottomMargin=0.75 * inch

Copilot AI Feb 4, 2026

Magic number without explanation: The value 0.75 for top and bottom margins is not explained. Consider extracting this to a named constant at the module level with a descriptive name, such as DEFAULT_PAGE_MARGIN_INCHES = 0.75.

Copilot uses AI. Check for mistakes.

backend/src/generate_transcript.py

Comment on lines +415 to +422

+                      total_length_duration = extract_timestamp_from_ulid(
+                          rows[-1]["transcription_id"]  # pyright: ignore[reportAny]
+                      ) - extract_timestamp_from_ulid(rows[0]["transcription_id"])  # pyright: ignore[reportAny]
+                      humanh_readable_length = " ".join(
+                          human_readable(
+                              relativedelta(seconds=int(total_length_duration.total_seconds()))
+                          )
+                      )

Copilot AI Feb 4, 2026

Incorrect duration calculation: The total_length_duration is calculated as the difference between the last and first transcription timestamps. However, this only measures the time span from start to finish, not the actual accumulated duration of all transcriptions. If there were gaps in recording (e.g., pauses), this would incorrectly report the total time including those gaps. Consider clarifying the metric name to "Time Span" or "Session Duration" instead of "Total Time" to avoid confusion.

Copilot uses AI. Check for mistakes.

backend/src/generate_transcript.py

+                      total_length_duration = extract_timestamp_from_ulid(
+                          rows[-1]["transcription_id"]  # pyright: ignore[reportAny]
+                      ) - extract_timestamp_from_ulid(rows[0]["transcription_id"])  # pyright: ignore[reportAny]
+                      humanh_readable_length = " ".join(

Copilot AI Feb 4, 2026

Typo in variable name: 'humanh_readable_length' should be 'human_readable_length'. The variable name has an extra 'h'.

Copilot uses AI. Check for mistakes.

backend/src/generate_transcript.py

Comment on lines +217 to +219

+                  conn = sqlite3.connect(db_path)
+                  conn.row_factory = sqlite3.Row
+                  cursor = conn.cursor()

Copilot AI Feb 4, 2026

Inconsistent database access pattern: This script uses raw sqlite3 connections directly, while the rest of the codebase (e.g., database.py) uses SQLAlchemy with the PersistentDatabase abstraction. This creates inconsistency in how the database is accessed and makes the code harder to maintain. Consider using the existing PersistentDatabase class and SQLAlchemy for consistency.

Copilot uses AI. Check for mistakes.

backend/src/generate_transcript.py

+                          and current_speaker is not None
+                          and current_timestamp is not None
+                      ):
+                          timestamp_str = current_timestamp.strftime("%Y-%m-%d %H:%M:%S.%f")[:-3]

Copilot AI Feb 4, 2026

Inconsistent date formatting: The code uses strftime("%Y-%m-%d %H:%M:%S.%f")[:-3] to format timestamps in multiple places. This format string is repeated throughout the code. Consider extracting this to a constant or helper function for consistency and maintainability.

Copilot uses AI. Check for mistakes.

backend/src/generate_transcript.py Show resolved Hide resolved

StarDylan merged commit 177afcd into main

8 checks passed

StarDylan deleted the report-generation branch

February 4, 2026 19:58

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet