Today I tried to get voice recognition working using hardware acceleration on an ARM64 system with an Nvidia GPU. I eventually found the easiest way was to make my own Dockerfile based on their own, but using this base image:-
FROM nvidia/cuda:12.3.2-cudnn9-runtime-ubuntu22.04
This comes with all the necessary Cuda support libraries pre-loaded and allowed me to quickly compile, build and publish a Grace Blackwell compatible docker image of faster-whisper. Once I've tested it a bit and set up an automated pipeline I'll publish it.
The difference between CPU and GPU processing was only a second or so, but it does make the recognition feel snappy and you can use the large model easily instead of the tiny one.
Permalink