Skip to content
This repository was archived by the owner on Jan 5, 2026. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions python/chatwithdata/common_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
"""Common utility functions for the search module."""

import argparse
from urllib.parse import urlparse


def absolute_url(value):
"""
Validate that the input is an absolute URL with a valid scheme and netloc.

Args:
value (str): The URL to validate.
Raises:
argparse.ArgumentTypeError: If the URL is not absolute or does not have a valid scheme and netloc.
Returns:
str: The validated absolute URL.
"""
parsed = urlparse(value)
# Check if the scheme and netloc are present
if not parsed.scheme or not parsed.netloc:
raise argparse.ArgumentTypeError(f"'{value}' is not a valid absolute URL")
return value


def valid_name(value):
"""
Validate that the input is a valid name that may include alphanumeric symbols, "-" or "_".
The method doesn't check a specific length and case.

Args:
value (str): The name to validate.
Raises:
argparse.ArgumentTypeError: If the name is empty, contains only whitespace, or has invalid characters.
Returns:
str: The validated name.
"""
if not value or not value.strip():
raise argparse.ArgumentTypeError(f"'{value}' is not a valid name")
parsed_value = value.replace("-", "").replace("_", "")
if not parsed_value.isalnum():
raise argparse.ArgumentTypeError(
f"'{value}' contains invalid characters. Look at the documentation for naming conventions."
)
return value
Binary file added python/chatwithdata/data/Benefit_Options.pdf
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added python/chatwithdata/data/PerksPlus.pdf
Binary file not shown.
Binary file added python/chatwithdata/data/employee_handbook.pdf
Binary file not shown.
37 changes: 37 additions & 0 deletions python/chatwithdata/data/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Required data

Initial data to upload into a blob storage to make this template working are located in this folder in pdf format.

To upload initial data into the blob storage you can pick one of the options from below.

_Note: By default, this solution deploys all Azure resources to a VNet. To successfully execute any of the commands outlined in this README, they must be run from within the network._

## How to upload data into the storage using the Python script

To execute the upload_data.py script, ensure you have Python 3.8+ installed and the required dependencies (azure-identity and azure-storage-blob) by running `pip install azure-identity azure-storage-blob`. Authenticate to Azure using `az login` or environment variables for service principal credentials.

Run the script from the terminal with the following command:

python -m upload_data --storage_name <your_storage_account_name> --container_name <your_container_name>

Replace <your_storage_account_name> and <your_container_name> with your Azure Storage account and container names. The script uploads all .pdf files from its directory to the specified container, creating the container if it doesn't exist. Ensure the storage account name is lowercase and contains only letters. Logs will confirm the upload process.

## How to upload data using the Linux Shell Script

Authenticate to Azure using `az login` or environment variables for service principal credentials. Execute the script:

```bash
./upload_data.sh <storage_account_name> <container_name>
```

Replace <storage_account_name> and <container_name> with your Azure Storage account and container names.

## How to upload data using the PowerShell Script

Authenticate to Azure using `az login` or environment variables for service principal credentials. Execute the PowerShell script:

```bash
./upload_data.ps1 -StorageAccountName <storage_account_name> -ContainerName <container_name>
```

Replace <storage_account_name> and <container_name> with your Azure Storage account and container names.
2 changes: 2 additions & 0 deletions python/chatwithdata/data/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
azure-storage-blob>=12.19.0
azure-identity>=1.16.1
Binary file added python/chatwithdata/data/role_library.pdf
Binary file not shown.
75 changes: 75 additions & 0 deletions python/chatwithdata/data/upload_data.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
<#
.SYNOPSIS
Uploads all PDF files from the current directory to an Azure Blob Storage container.

.DESCRIPTION
This script uses the Azure CLI to authenticate and upload files to Azure Blob Storage.
It checks if the specified container exists and creates it if necessary.

.PARAMETER StorageAccountName
The name of the Azure Storage account.

.PARAMETER ContainerName
The name of the Azure Blob Storage container.

.EXAMPLE
./upload_data.ps1 -StorageAccountName "mystorageaccount" -ContainerName "mycontainer"
#>

param (
[Parameter(Mandatory = $true)]
[string]$StorageAccountName,

[Parameter(Mandatory = $true)]
[string]$ContainerName
)

# Get the current directory
$LocalFolder = Get-Location

# Check if the container exists, and create it if it doesn't
Write-Host "Checking if container '$ContainerName' exists in storage account '$StorageAccountName'..."
$ContainerExists = az storage container exists `
--account-name $StorageAccountName `
--name $ContainerName `
--auth-mode login `
--query "exists" `
--output tsv

if ($ContainerExists -ne "true") {
Write-Host "Container '$ContainerName' does not exist. Creating it..."
az storage container create `
--account-name $StorageAccountName `
--name $ContainerName `
--auth-mode login `
--output none
if ($LASTEXITCODE -ne 0) {
Write-Host "Failed to create container '$ContainerName'." -ForegroundColor Red
exit 1
}
Write-Host "Container '$ContainerName' created successfully." -ForegroundColor Green
} else {
Write-Host "Container '$ContainerName' already exists." -ForegroundColor Green
}

# Upload all PDF files from the current directory
Write-Host "Uploading PDF files from '$LocalFolder' to container '$ContainerName'..."
Get-ChildItem -Path $LocalFolder -Recurse -Filter *.pdf | ForEach-Object {
$FilePath = $_.FullName
$BlobName = $FilePath.Substring($LocalFolder.Length + 1) -replace '\\', '/'
Write-Host "Uploading '$FilePath' as blob '$BlobName'..."
az storage blob upload `
--account-name $StorageAccountName `
--container-name $ContainerName `
--name $BlobName `
--auth-mode login `
--file $FilePath `
--overwrite
if ($LASTEXITCODE -ne 0) {
Write-Host "Failed to upload '$FilePath'." -ForegroundColor Red
} else {
Write-Host "Uploaded '$FilePath' successfully." -ForegroundColor Green
}
}

Write-Host "Upload process completed." -ForegroundColor Cyan
113 changes: 113 additions & 0 deletions python/chatwithdata/data/upload_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
"""
Initialize blob storage with local data.

We assume that this code will be executed just once to prepare a blob container for experiments.
"""

import argparse
import logging
import os
from pathlib import Path

from azure.identity import DefaultAzureCredential
from azure.storage.blob import BlobServiceClient

logger = logging.getLogger(__name__)

# Setting the threshold of logger to DEBUG
logger.setLevel(logging.DEBUG)

# Create a console handler and set its level to DEBUG
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.DEBUG)

# Create a formatter and set it for the console handler
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
console_handler.setFormatter(formatter)

# Add the console handler to the logger
logger.addHandler(console_handler)

STORAGE_ACCOUNT_URL = "https://{storage_account_name}.blob.core.windows.net"


def upload_data_files(
credential: DefaultAzureCredential,
storage_account_name: str,
storage_container: str,
local_folder: str,
):

account_url = STORAGE_ACCOUNT_URL.format(storage_account_name=storage_account_name)
blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)
blob_container_client = blob_service_client.get_container_client(storage_container)

if not blob_container_client.exists():
logger.info(f"Creating {storage_container} container.")
blob_container_client.create_container()
logger.info("Done.")

for file in Path(local_folder).rglob("*.pdf"):
logger.info(f"Uploading {file} to {storage_container}.")

# construct blob name from file path
# everything rather than local_folder
file_subpath = os.path.relpath(file, start=local_folder)

# generate a unique name of the file
file_name = file_subpath.replace(os.sep, "_")

try:
logger.info(f"Ready to copy: {str(file)} to {file_name}.")
with open(file=str(file), mode="rb") as data:
blob_container_client.upload_blob(name=file_name, data=data, overwrite=True)
logger.info("Done.")
except Exception as e:
logger.info(f"Exception uploading file name {file_name}: {e}")
raise


def main():
"""
Upload data files to Azure Blob Storage.
This function reads the parameters from the command line, authenticates to Azure using default credentials,
and uploads the files from a specified local folder to a specified Azure Blob Storage container.
"""
logger.info("Read and check parameters.")
# Extract the configuration parameters from the environment variables
parser = argparse.ArgumentParser(description="Parameter parser")
parser.add_argument(
"--storage_name",
required=True,
help="Azure storage account name",
)
parser.add_argument(
"--container_name",
required=True,
help="Azure storage container name",
)
args = parser.parse_args()

# Validate storage account name
if not args.storage_name.islower() or not args.storage_name.isalnum():
raise ValueError("Storage account name must be a lowercase alphanumeric string (letters and digits).")

# Using default Azure credentials assuming that it has all needed permissions
logger.info("Authenticate code into Azure using default credentials.")
credential = DefaultAzureCredential()

# Create the full document index
logger.info("Uploading process has been started.")
upload_data_files(
credential=credential,
storage_account_name=args.storage_name,
storage_container=args.container_name,
local_folder=os.path.dirname(__file__),
)
logger.info("Uploading process has been completed.")


# This block ensures that the script runs the main function only when executed directly,
# and not when imported as a module in another script.
if __name__ == "__main__":
main()
50 changes: 50 additions & 0 deletions python/chatwithdata/data/upload_data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/bin/bash

# Description:
# This script uploads all PDF files from the current directory to an Azure Blob Storage container.
# It uses the Azure CLI for authentication and file uploads.

# Usage:
# ./upload_data.sh <storage_account_name> <container_name>

# Check if the required arguments are provided
if [ "$#" -ne 2 ]; then
echo "Usage: $0 <storage_account_name> <container_name>"
exit 1
fi

STORAGE_ACCOUNT_NAME=$1
CONTAINER_NAME=$2
LOCAL_FOLDER=$(pwd)

# Check if the container exists, and create it if it doesn't
echo "Checking if container '$CONTAINER_NAME' exists in storage account '$STORAGE_ACCOUNT_NAME'..."
CONTAINER_EXISTS=$(az storage container exists --account-name "$STORAGE_ACCOUNT_NAME" --name "$CONTAINER_NAME" --auth-mode login --query "exists" --output tsv)

if [ "$CONTAINER_EXISTS" != "true" ]; then
echo "Container '$CONTAINER_NAME' does not exist. Creating it..."
az storage container create --account-name "$STORAGE_ACCOUNT_NAME" --name "$CONTAINER_NAME" --auth-mode login --output none
if [ $? -ne 0 ]; then
echo "Failed to create container '$CONTAINER_NAME'."
exit 1
fi
echo "Container '$CONTAINER_NAME' created successfully."
else
echo "Container '$CONTAINER_NAME' already exists."
fi

# Upload all PDF files from the current directory
echo "Uploading PDF files from '$LOCAL_FOLDER' to container '$CONTAINER_NAME'..."
for file in $(find "$LOCAL_FOLDER" -type f -name "*.pdf"); do
# Generate a unique blob name by replacing directory separators with underscores
BLOB_NAME=$(echo "$file" | sed "s|$LOCAL_FOLDER/||" | tr '/' '_')
echo "Uploading '$file' as blob '$BLOB_NAME'..."
az storage blob upload --account-name "$STORAGE_ACCOUNT_NAME" --container-name "$CONTAINER_NAME" --name "$BLOB_NAME" --auth-mode login --file "$file" --overwrite
if [ $? -ne 0 ]; then
echo "Failed to upload '$file'."
else
echo "Uploaded '$file' successfully."
fi
done

echo "Upload process completed."
16 changes: 16 additions & 0 deletions python/chatwithdata/index_config/documentDataSource.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"name": "<data_source_name>",
"description": null,
"type": "azureblob",
"subtype": null,
"credentials": {
"connectionString": "<connection_string>"
},
"container": {
"name": "<container_name>",
"query": null
},
"dataChangeDetectionPolicy": null,
"dataDeletionDetectionPolicy": null,
"encryptionKey": null
}
Loading