On the Use of Automatic Speaker Verification Systems in Forensic Casework

View all publications


Reference

Title: On the Use of Automatic Speaker Verification Systems in Forensic Casework

Author(s): Johan Koolwaaij & Lou Boves

Reference: Proceedings of the Second International Conference on Audio- and Video-based Biometric Person Authentication (AVBPA-99), Washington, USA, pp. 224-229

Keywords: Speaker Recognition

There is a PostScript version (145423 bytes) available.

Abstract

Automatic Speaker Verification (SV) and Forensic Casework have long been considered as essentially unrelated disciplines, because the former was seen as a one alternative forced choice problem, whereas the latter used to be presented as a an open set identification problem. However, Doddington has pointed out that many forensic cases boil down to the question whether a set of recordings, some of which are definitely from the perpetrator and others from a single suspect, do or do not originate from the same speaker. In other words: many forensic cases can be formulated as a one alternative forced choice problem.

One broad class of cases where automatic SV techniques might prove to be useful in forensic work is in the processing of telephone taps that are made in the investigation of drug trafficking cases. Very often, the perpetrators are foreigners, who speak a language unknown to the police officers but also to the forensic phoneticians. In many cases the police is interested in knowing how many different speakers are involved in a given set of telephone taps. Leaving the speaker recognition task to interpreters has been shown to be unreliable, if only because of possible links between the interpreters and the criminals. Such links are to be expected if the case is investigated in a small language community, where the number of persons who speak the language is small. In these cases a text-independent SV system might be of great help.

In all stages of forensic applications of speaker recognition it is important that one is
able to state a confidence interval for conclusions regarding the identity of the voices of a known suspect and an unknown perpetrator. If the statement must be used in a court, a specification of the confidence level is necessary to allow the judge to weigh this piece of evidence. If it is to be used during the police investigation, confidence levels will be used to weigh the evidence in setting priorities for investigating specific suspects. In the harassment case described in this paper, the confidence statement was used to decide on how to proceed with the investigation.

It is well known that forensic phoneticians often have difficulty in making estimates of the confidence level with which they can identify a person by her/his voice. Thus, forensic case workers are interested to know to what extent the use of automatic SV systems could be used to obtain an 'objective' confidence estimate.

In this paper we investigate the implications of using an SV system to estimate the confidence level for an identity statement on the basis of a specific case that was brought to our attention by a Dutch private investigations bureau. A male person left obscene messages in the voice mail boxes of female employees of a large IT company. The calls could be traced to handsets in in-house classrooms. Three victims identify the same colleague as the likely perpetrator, but the accused person denied all charges, and agreed to collaborate in a test in which he read transcripts of the messages. The speech was recorded in one of the classrooms, using the same handset type and the same voice mail system as during the harassing calls.
However, while the harassment calls were whispered, probably with the intent to sound 'sexy', the test calls were read with normal voice. Approximately one month after the test recordings the harassing calls started again, in a whispery voice and from the same classrooms. Now, the obvious question is whether the two sets of harassing calls have been made by the same speaker, and whether this speaker is the same person as the one who read the transcripts. Obviously, this problem can be cast in the form of a one alternative forced choice problem: we can take the test calls for building a voice pattern of a known speaker, and try to answer the question whether all harassing calls have been made by the same person.

In this paper we take this case as the starting point to investigate to the contingencies of applying the procedures and technology developed for Automatic Speaker Verification to forensic cases that can be formulated as speaker verification problems. Error processing SSI file