Today I tried to get voice recognition working using hardware acceleration on an ARM64 system with an Nvidia GPU.  I eventually found the easiest way was to make my own Dockerfile based on their own, but using this base image:-

FROM nvidia/cuda:12.3.2-cudnn9-runtime-ubuntu22.04

This comes with all the necessary Cuda support libraries pre-loaded and allowed me to quickly compile, build and publish a Grace Blackwell compatible docker image of faster-whisper.  Once I've tested it a bit and set up an automated pipeline I'll publish it.

The difference between CPU and GPU processing was only a second or so, but it does make the recognition feel snappy and you can use the large model easily instead of the tiny one.

Permalink