A lightweight, real-time bird call classifier with a sleek web app, ready for EDGE deployment and instant audio recognition anywhere.
ββ This is a recently kicked off project and a work in progress. It is meant to become a bird call recognition app that can both run on the cloud and on the edge eventually. ββ
This repository is going to be split in different repositories soon.
- nightingale (model and python package)
- nightingale-backend (backend with RESTful API using FastAPI, Docker, uvicorn, ...)
The following sections show which tools need to be installed, how to download the example data and model, how to run an MLflow tracking server with the downloaded model locally and shwo how to run the dockerized application.
- Git: https://git-scm.com/book/en/v2/Getting-Started-Installing-Git
- Docker Desktop: https://www.docker.com/products/docker-desktop/
- MLflow:
# Install mlflow locally
pip install mlflowOn Windows, if not already done, add the folder with the mlflow.exe to the environment variable path. The location of the mlflow.exe can be found running
python -m site --user-baseThe data pipline and tracking server are not set up fully. A temporary solution to run an example is to download preprocessed bird call data and model artifacts of a pre-trained model.
The bird call data is in .wav format and resampled to 16kHz for three different birds and can be used as input to perform model inference. The the zipped data can be downloaded here: https://www.kaggle.com/datasets/emilcode/nightingale-minimal-example
The pre-trained model is stored in a mlflow file-based meta store and in an artifact store. The zipped folder containing them can be downloaded here: https://www.kaggle.com/models/emilcode/nightingale
After downloading the folders unzip them.
https://mlflow.org/docs/3.1.3/ml/tracking/tutorials/remote-server/
https://mlflow.org/docs/3.1.3/ml/tracking/server/
TODO!!!!
If you don't configure/set the MLFLOW_TRACKING_URI to a specific location, MLflow will create a mlruns directory in the CWD or your python run (e.g. next to the notebook if mlfow run has been started for the notebook).
MLFLOW_TRACKING_URI= (local file system or url to remote tracking server) MLFLOW_EXPERIMENT_ID=<experiment_id>
Optional: To change the MLFLOW_TRACKING_URI during container startup add something like "containerEnv": { "MLFLOW_TRACKING_URI": "./mlruns" }, to the .devcontainer.json file.
If you have a databricks account and you prefere to track your experiments and register your models to the databricks tracking server and unity catalog your can set up the connection doing the following.
Create an .env file in your project root and add: ''' DATABRICKS_TOKEN= DATABRICKS_HOST= (something like: https://abc-a1234b12-a123.cloud.databricks.com) MLFLOW_TRACKING_URI=databricks MLFLOW_REGISTRY_URI=databricks-uc (for unity catalog) MLFLOW_EXPERIMENT_ID=<experiment_id> '''
The .env file will be ignored by git by default, but make sure to NOT commit and push your .env file as it contains credentials not meant to be public.
Open the bash/Git bash and change the directory to the unzipped folder containing the model.
# Change to directory containing the local mlflow tracking server state
cd LocalMLflowTrackingServer
# Run the tracking server locally and accessible for the app from within a docker container
mlflow server --host 0.0.0.0 --port 5757To check if the server is up and running, open a browser window a type "localhost:5757" into the address bar. You should see the UI for the MLflow server.
As soon as the mlflow tracking server is up and running, the pre-trained model is accessible for the application. To run the application locally, clone the repository, start the docker engine and then build and run the production docker container.
# Clone the repository
git clone https://github.com/emilcode-dev/nightingale
# Navigate to the project folder
cd nightingale
# Build container image
docker build -t nightingale-service:latest .
# Run the application
docker run -p 8080:8080 nightingale-service:latestAfter the application has started, open your browser and enter "http://localhost:8080/static/index.html". You should be seeing the following locally hosted website. To try it out upload a .wav file of one of the three different brid species downloaded before. Currently only the three bird types form the minimal example data are supported. More will be supported in the future with a more solid data pipeline. π
Developers are encouraged to use the provided development container.
To run the dev container follow these steps:
- Install VS Code: https://code.visualstudio.com/download.
- Install Docker Desktop, if not done yet: https://www.docker.com/products/docker-desktop/.
- Start VS Code and install the extension dev containers (ms-vscode-remote.remote-containers).
- Restart VS Code to make sure the extension is active.
- In VS Code go to File -> Open Folder and open the top level folder of this repository in VS Code.
- VS Code will ask you to reopen the folder in the dev container, which you should do.
This will open and connect to the development container defined in .devcontainer/devcontainer.json and .devcontainer/Dockerfile. From then on you are working within the container and you are up and running.
More information on dev containers can be found here: https://code.visualstudio.com/docs/devcontainers/tutorial
In this current moment the data pipeline is not ready yet. Data is preprocessed and stored locally. π¬
Open and run notebooks/train_nightingale.ipynb.
cd app/fastapi
uvicorn main:app --reload --host 127.0.0.1 --port 8000The application API can be tested with
curl -X POST "http://127.0.0.1:8080/predict/" -H "accept: application/json" -H "Content-Type: multipart/form-data"
-F "file=@<path to audio data>XC19645.wav;type=audio/wav"Contributions are welcome! π
- Fork the project and clone it
- Create a new branch (
git checkout -b feature/your-feature) - Commit your changes (
git commit -m 'Add new feature') - Push to your branch (
git push origin feature/your-feature) - Open a Pull Request
Contributors: Ephraim Eckl, David Pellhammer
- Setup local MLflow tracking server
- Setup remote MLflow tracking server in the cloud or use databricks.
- Use DVC for data versioning
- Integrate ONNX
- IaC with Terraform
- Container orchestration with K8s
- Consider aspects of DevSecOps
This project is licensed under the MIT License β see the LICENSE file for details.


