Skip to content

WIP: Implement LTU model#19

Draft
The-Mats wants to merge 6 commits intomainfrom
ltu-model
Draft

WIP: Implement LTU model#19
The-Mats wants to merge 6 commits intomainfrom
ltu-model

Conversation

@The-Mats
Copy link
Contributor

@The-Mats The-Mats commented Nov 10, 2024

We want to experiment with LTU as it is a promising model that combines natural language and audio classification to create a more capable and diverse model. Gong et al. created a newer version of the model named LTU-AS, which uses whisper features and performs better especially for speech and music. I think we should first try LTU and then maybe LTU-AS!

For inference they provide shell scripts that locally load a website for easy interaction, which we don't really need.

The following steps are needed to integrate the model into the BS pipeline:

  • Read paper
  • Download the needed model files and there are several as we also need the LLM (Not sure if we can remove it later)
  • Add missing packages
  • Create a model file that includes the needed code from their inference_gradio.py
  • Add get_embedding()
  • Create experiment and model config
  • Test with BEANS
    • linear probing
    • few shot
    • fine tuning ?
  • Test with BirdSet
    • linear probing
    • few shot
    • finetuning ? is that possible?
  • ...

@The-Mats The-Mats marked this pull request as draft November 10, 2024 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant