Skip to content

feat(backend): pdf + txt report generator#45

Merged
StarDylan merged 1 commit intomainfrom
report-generation
Feb 4, 2026
Merged

feat(backend): pdf + txt report generator#45
StarDylan merged 1 commit intomainfrom
report-generation

Conversation

@StarDylan
Copy link
Owner

No description provided.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds PDF and text report generation functionality for project transcripts. The feature allows users to export formatted transcripts of their interview projects, including speaker utterances and AI analyses, in both plain text and PDF formats.

Changes:

  • Added reportlab (v4.4.9) and pillow (v12.1.0) dependencies for PDF generation capabilities
  • Created new standalone script generate_transcript.py with CLI interface for generating transcript reports
  • Implemented PDF formatting with custom styles, colors, and layouts for professional-looking reports

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 17 comments.

File Description
backend/uv.lock Added reportlab and pillow dependencies with all platform-specific wheels
backend/pyproject.toml Added reportlab>=4.4.9 to project dependencies
backend/src/generate_transcript.py New script implementing transcript generation logic with PDF and text output

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

TableStyle,
)
from reportlab.lib.enums import TA_CENTER
from dateutil.relativedelta import relativedelta
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing dependency: The code imports 'dateutil.relativedelta' but 'python-dateutil' is not listed as a dependency in pyproject.toml. While it may be a transitive dependency, it should be explicitly declared since the code directly imports from it.

Copilot uses AI. Check for mistakes.
Comment on lines +217 to +285
conn = sqlite3.connect(db_path)
conn.row_factory = sqlite3.Row
cursor = conn.cursor()

# Fetch all transcriptions for the project, ordered by transcription_id (ULID)
# Since ULID is chronologically sortable, sorting by transcription_id gives us time order
query = """
SELECT
transcription_id,
speaker,
text_output,
created_at,
session_id,
user_id
FROM transcriptions
WHERE project_id = ?
ORDER BY transcription_id ASC
"""

_ = cursor.execute(query, (project_id,))
rows = cursor.fetchall()

if not rows:
conn.close()
return f"No transcriptions found for project: {project_id}"

# Get project name
_ = cursor.execute("SELECT name FROM project WHERE project_id = ?", (project_id,))
project_row = cursor.fetchone() # pyright: ignore[reportAny]
project_name: str = (
cast(str, project_row["name"])
if project_row and project_row["name"]
else "Untitled Project"
)

# Fetch AI analyses for the project
ai_analyses_query = """
SELECT
analysis_id,
text,
span,
transcript_context_start,
transcript_context_end,
summary
FROM ai_analyses
WHERE project_id = ?
ORDER BY transcript_context_end ASC
"""
_ = cursor.execute(ai_analyses_query, (project_id,))
ai_analyses = cursor.fetchall()

# Build a map of transcription_id -> list of analyses that end at that transcription
analyses_by_end: dict[str, list[dict[str, str]]] = {}
for analysis in ai_analyses: # pyright: ignore[reportAny]
end_id = cast(str, analysis["transcript_context_end"])
if end_id not in analyses_by_end:
analyses_by_end[end_id] = []
analyses_by_end[end_id].append(
{
"analysis_id": cast(str, analysis["analysis_id"]),
"text": cast(str, analysis["text"]),
"span": cast(str, analysis["span"]) if analysis["span"] else "",
"summary": cast(str, analysis["summary"]),
"start": cast(str, analysis["transcript_context_start"]),
"end": end_id,
}
)

conn.close()
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource leak: The database connection is not properly managed. If an exception occurs after conn = sqlite3.connect(db_path) but before conn.close(), the connection will not be closed. Use a context manager (with statement) to ensure proper cleanup: 'with sqlite3.connect(db_path) as conn:' instead of manually calling conn.close().

Copilot uses AI. Check for mistakes.
Comment on lines +409 to +413
if output_file:
output_path = Path(output_file)
output_path.parent.mkdir(parents=True, exist_ok=True)
_ = output_path.write_text(transcript, encoding="utf-8")
print(f"Transcript saved to: {output_file}")
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File overwrite without warning: Both the text transcript and PDF files are written without checking if they already exist, and without asking for user confirmation. This could silently overwrite existing files. Consider adding a check or warning when files already exist, or provide a --force flag for the CLI.

Copilot uses AI. Check for mistakes.

args = parser.parse_args()

project_id_arg = cast(str, args.project_id)
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing validation for CLI arguments: The CLI doesn't validate the project_id format (should be a valid ULID). If an invalid project_id is provided, it will fail later with potentially confusing error messages. Consider validating the project_id format early and providing clear error messages.

Suggested change
project_id_arg = cast(str, args.project_id)
project_id_arg = cast(str, args.project_id)
# Validate that the provided project_id is a well-formed ULID before proceeding.
try:
# This will raise ValueError if the format is invalid.
_ = ulid.ULID.from_str(project_id_arg.upper()) # pyright: ignore[reportAny]
except ValueError as exc:
parser.error(f"Invalid project_id '{project_id_arg}': {exc}")

Copilot uses AI. Check for mistakes.
Comment on lines +53 to +54
doc = SimpleDocTemplate(
pdf_file, pagesize=letter, topMargin=0.75 * inch, bottomMargin=0.75 * inch
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number without explanation: The value 0.75 for top and bottom margins is not explained. Consider extracting this to a named constant at the module level with a descriptive name, such as DEFAULT_PAGE_MARGIN_INCHES = 0.75.

Copilot uses AI. Check for mistakes.
Comment on lines +415 to +422
total_length_duration = extract_timestamp_from_ulid(
rows[-1]["transcription_id"] # pyright: ignore[reportAny]
) - extract_timestamp_from_ulid(rows[0]["transcription_id"]) # pyright: ignore[reportAny]
humanh_readable_length = " ".join(
human_readable(
relativedelta(seconds=int(total_length_duration.total_seconds()))
)
)
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect duration calculation: The total_length_duration is calculated as the difference between the last and first transcription timestamps. However, this only measures the time span from start to finish, not the actual accumulated duration of all transcriptions. If there were gaps in recording (e.g., pauses), this would incorrectly report the total time including those gaps. Consider clarifying the metric name to "Time Span" or "Session Duration" instead of "Total Time" to avoid confusion.

Copilot uses AI. Check for mistakes.
total_length_duration = extract_timestamp_from_ulid(
rows[-1]["transcription_id"] # pyright: ignore[reportAny]
) - extract_timestamp_from_ulid(rows[0]["transcription_id"]) # pyright: ignore[reportAny]
humanh_readable_length = " ".join(
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in variable name: 'humanh_readable_length' should be 'human_readable_length'. The variable name has an extra 'h'.

Copilot uses AI. Check for mistakes.
Comment on lines +217 to +219
conn = sqlite3.connect(db_path)
conn.row_factory = sqlite3.Row
cursor = conn.cursor()
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent database access pattern: This script uses raw sqlite3 connections directly, while the rest of the codebase (e.g., database.py) uses SQLAlchemy with the PersistentDatabase abstraction. This creates inconsistency in how the database is accessed and makes the code harder to maintain. Consider using the existing PersistentDatabase class and SQLAlchemy for consistency.

Copilot uses AI. Check for mistakes.
and current_speaker is not None
and current_timestamp is not None
):
timestamp_str = current_timestamp.strftime("%Y-%m-%d %H:%M:%S.%f")[:-3]
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent date formatting: The code uses strftime("%Y-%m-%d %H:%M:%S.%f")[:-3] to format timestamps in multiple places. This format string is repeated throughout the code. Consider extracting this to a constant or helper function for consistency and maintainability.

Copilot uses AI. Check for mistakes.
@StarDylan StarDylan merged commit 177afcd into main Feb 4, 2026
8 checks passed
@StarDylan StarDylan deleted the report-generation branch February 4, 2026 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments