!! History Commons Alert, Exciting News
Profile: Steve Cain
Steve Cain was a participant or observer in the following events:
Copies of FBI infrared surveillance tapes taken during the first hours of the FBI assault against the Branch Davidian compound near Waco, Texas (see April 19, 1993), clearly show repeated bursts of rhythmic flashes from agents’ positions and from the compound; two experts hired by the surviving Davidians say the flashes must be gunfire. A third expert retained by the House Government Reform Committee, Carlos Ghigliotti, an expert in thermal imaging and videotape analysis, says he, too, believes the flashes to be gunfire. “The gunfire from the ground is there, without a doubt,” he says. FBI officials have long maintained that no agent fired a shot during either the 51-day standoff or during the final assault. Michael Caddell, the lead lawyer for the Davidians in their lawsuit against the government (see April 1995), says he has shown the tapes and the expert analysis to John Danforth, the former senator who is leading a government investigation into the FBI’s actions during the siege and the assault (see September 7-8, 1999). Caddell says his two experts are former Defense Department surveillance analysts. One of Caddell’s two experts also says the FBI’s infrared videotapes that have been released to the public, Congress, and the courts may have been altered. “There’s so much editing on this tape, it’s ridiculous,” says Steve Cain, an audio and video analysis expert who has worked with the Secret Service and the Internal Revenue Service. Cain says his analysis is preliminary because he has not been granted access to the original tapes. But, he says, the tapes appear to have been erased. There are significant erasures during the 80-minute period before the compound began burning. Cain says: “It’s just like the 18-minute gap on the Watergate tape. That was erased six times by Rose Mary Woods (see November 21, 1973). That’s why we’re trying to get to the originals.” Cain also says that he believes images were inserted into the videotapes, perhaps from different video cameras. Caddell says, “I think at this point, it’s clear that the whole investigation, and particularly the fire investigation, was garbage in-garbage out.” The videotapes were used in a 1993 Treasury Department review of the siege (see Late September - October 1993) and as evidence in a 1994 criminal trial against some of the surviving Davidians (see January-February 1994), both of which concluded that the Davidians themselves set the fires that consumed the compound. [Associated Press, 10/6/1999; Dallas Morning News, 10/7/1999]
The release of an audio message by a man thought to be Osama bin Laden (see November 12, 2002) sparks several publications to run stories about the authentication of the voice on the tape. These articles make several points about voice analysis of apparent bin Laden recordings:
Machine analysis: Some aspects of voice identification are done my machine. Voice authentication software measures the acoustic qualities of a person’s voice, such as pitch, loudness, basic resonances, frequency, and amplitude. [New Scientist, 11/13/2002; Slate, 11/15/2002] This produces spectrographic information and can also be used to look for specific features of a voice, such as a nasal quality. In addition, every person creates the same sounds using a slightly different set of basic pitches, so the set of frequencies in bin Laden’s vowels, like those in “ea” from “fear,” will be marginally different from anyone else’s. By examining this frequency detail for every vowel and comparing them to previous examples, a machine analysis can tell if they are the same and were all said by him. [Slate, 11/15/2002] However, “People hardly ever pronounce the same word the same way twice, even in the same utterance,” says Robert Berkovitz, a speech analyst with Sensimetrics Corp. [CBS News, 11/13/2002]
Human analysis: Some aspects of voice identification are done by humans, who are, according to Slate, “very good at doing the kind of thing most people do subconsciously—telling if someone comes from a particular region by recognizing basic vowel and consonant qualities.” For example, a human analyst can tell whether the “Ye” sound in “Yemen” is of the right length and stress for bin Laden’s dialect. [Slate, 11/15/2002] Experts listen to previous recordings of bin Laden, and compare them syllable by syllable. [New Scientist, 11/13/2002; Slate, 11/15/2002] Experts can also verify whether words on a tape generally match those uttered by someone of bin Laden’s age and educational background. [Slate, 11/15/2002]
Quality of tape: According to Slate, the November tape is “allegedly very noisy and possibly went down a phone line at some point.” [Slate, 11/15/2002] However, the New Scientist reports, “Voice analysis experts say the quality of the recording appears good enough to determine if the recording is genuine.” It also quotes Steve Cain of Forensic Tape Analysis, a company that received snippets of the tape from US media, who says, “It seems like it is at least clear enough and there’s enough amplitude of that unknown speaker’s voice that if you had a known sample of bin Laden it would be possible.” [New Scientist, 11/13/2002]
Splicing: Analysis can determine whether a tape is spliced together. Potential red flags include hitches in timing and rhythm, removal of background noise, and different pitch to accommodate for differences in background noise. [Slate, 11/15/2002]
It makes no difference to voice analysis what language a recording is in. [CBS News, 11/13/2002]
Uncertainty: The New Scientist quotes Tomi Kinnunnen, an expert in computer analysis of speech at the University of Joensuu, Finland, as saying: “There is always the possibility of error.… But if you have a clean sample with little noise, you can quite reliably say [who it is].” [New Scientist, 11/13/2002] However, according to Slate, human and machine analyses can be “formidable,” but “neither type of analysis can say with 100 percent certainty that the speaker on the tape is bin Laden or anyone else.” [Slate, 11/15/2002] CBS finds that intelligence analysts are convinced the tape is from bin Laden, but “they will never be sure,” because “Computer voice analysis lacks the accuracy of fingerprint or DNA identification and can be hamstrung by a skilled impersonator or low-quality recording.” “You can say with some probability, but you can never be sure,” says Kenneth Stevens, a Massachusetts Institute of Technology expert on speech analysis and synthesis. “Where there’s a combination of strong motivation and relatively weak science, there’s an opportunity for deception,” adds Berkovitz. “You can’t put the voice in a slot and have it come out saying, ‘This is Joe Smith.’” [CBS News, 11/13/2002]
One analyst, Matsumi Suzuki of Japan Acoustic Lab, Tokyo, says that, although the recording seems genuine, the speaker sounds ill. [New Scientist, 11/13/2002]
Receive weekly email updates summarizing what contributors have added to the History Commons database
Developing and maintaining this site is very labor intensive. If you find it useful, please give us a hand and donate what you can.
If you would like to help us with this effort, please contact us. We need help with programming (Java, JDO, mysql, and xml), design, networking, and publicity. If you want to contribute information to this site, click the register link at the top of the page, and start contributing.