Collection and storage of speech has become a very common phenomenon in recent
times, thanks to the availability of the necessary electronic devices such as
microphones and memory. All personal computers, cell phones and the like come
equipped with these devices. With literally billions of people over the world having
mobile phones, audio records are rapidly getting built up, sometimes without the
knowledge of the user. In fact, many business and financial transactions are carried
out over the phone without any authenticating documentation, thus creating a host
of new legal problems. However, if this new mode of business is in the future likely
to replace (with some degree of regularity) the conventional signed paperwork, we
will need a robust authentication method for voice.
Yet, even with the increased use of voice technology, it seems highly unlikely
at the moment that courts of law will accept the current speaker recognition technology
as forensic evidence on par with signed documents, fingerprints, or DNA.
The reason is that compared to a fingerprint or DNA or even a handwritten signature,
voice has far greater variability. In addition, fatigue, common cold, emotions,
among other factors can change the voice sample sometimes beyond recognition.
In fact, in everyday life we as humans can sometimes incorrectly identify a speaker,
so imagine how difficult it for a machine to consistently identify a speaker accurately.
In spite of such limitations, which undoubtedly mitigate the evidentiary
weight of speaker identification and verification findings that are presented to the
court, speaker recognition can still play a significant role as a prime investigative
tool in criminal prosecutions.