Skip to content

Conversation

@gopal-raj-suresh
Copy link

Description

This PR introduces the DocSummarization blueprint, a GenAI-powered application for intelligent document summarization. The blueprint supports multiple document formats (PDF, TXT) and provides configurable summarization styles and lengths, making it suitable for enterprise document processing workflows.

Key Features:

  • PDF and text document summarization
  • Customizable summary length (short, medium, long)
  • Multiple summarization styles (executive, technical, bullet points)
  • Dual input modes (file upload or text paste)

Issues

n/a

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

No new repository-level dependencies.

All dependencies for the DocSummarization blueprint are listed in:

  • Backend: DocSummarization/backend/requirements.txt
  • Frontend: DocSummarization/frontend/package.json

Key technologies: FastAPI, React, OpenAI-compatible LLM integration, PyPDF

Tests

Testing Instructions:

git clone https://github.com/cld2labs/GenAIExamples.git
cd GenAIExamples
git checkout cld2labs/doc-summarization
cd DocSummarization

@github-actions
Copy link

Dependency Review

The following issues were found:
  • ❌ 4 vulnerable package(s)
  • ✅ 0 package(s) with incompatible licenses
  • ✅ 0 package(s) with invalid SPDX license definitions
  • ⚠️ 3 package(s) with unknown licenses.
See the Details below.

Vulnerabilities

DocSummarization/backend/requirements.txt

NameVersionVulnerabilitySeverity
Pillow10.2.0Pillow buffer overflow vulnerabilityhigh
python-multipart0.0.6python-multipart vulnerable to Content-Type Header ReDoShigh
Denial of service (DoS) via deformation `multipart/form-data` boundaryhigh
pypdf6.1.1pypdf possibly loops infinitely when reading DCT inline images without EOF markermoderate
pypdf can exhaust RAM via manipulated LZWDecode streamsmoderate
pypdf's LZWDecode streams be manipulated to exhaust RAMmoderate
pypdf has possible long runtimes for missing /Root object with large /Size valueslow
pypdf has possible long runtimes for malformed startxreflow
requests2.31.0Requests `Session` object does not verify requests after making first request with verify=Falsemoderate
Requests vulnerable to .netrc credentials leak via malicious URLsmoderate

License Issues

DocSummarization/backend/requirements.txt

PackageVersionLicenseIssue Type
Pillow10.2.0NullUnknown License
pypdf6.1.1NullUnknown License

DocSummarization/frontend/package.json

PackageVersionLicenseIssue Type
lucide-react^0.294.0NullUnknown License

Scanned Files

  • DocSummarization/backend/requirements.txt
  • DocSummarization/frontend/package.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant