Eventually, the volume threshold should be learnt. Because everyone's environment is different in terms of background noises. We can either do that in the `record` function itself (e.g., detects parts of the signal that correspond to voice, and treat the rest as background), or in a separate `calibrate_volume_threshold` function that assumes no one is speaking, records for n secs, and calculates a volume threshold accordingly.
Originally posted by @hello-amal in #68 (comment)