Senior Congress leader Abhishek Manu Singhvi wears many hats. The 53-year-old is a Rajya Sabha member, chairman of the Parliamentary Standing Committee on Law, Justice, Personnel and Grievances, a leading Supreme Court lawyer, and a Congress spokesperson to boot, a loud and combative voice of the party on various English news channel chat shows. After the recent surfacing of a CD with an alleged video clip of Singhvi engaged in what seems like a sexual act (which a judicial order has kept from being telecast), he has resigned as party spokesperson and chairman of the parliamentary committee.
Singhvi says he has not quit because he is guilty of what the video clip purports to show, but as an act of sacrifice to deprive an emboldened opposition of yet another allegation to hurl at the Congress-led UPA Government. He has also challenged the authenticity of the recording: “All allegations are patently baseless and false.” He has hinted at a conspiracy theory too, arguing that “people inimically opposed to me” (within or outside the Congress, he does not specify) “who have assiduously spent over ten days hearing, seeing, amplifying and distilling the CD, [have] found no vestige of any reference, not even remotely, to any illegality, corrupt practice or wrongdoing”. This suggests that he knows something about the CD’s provenance that he does not deem fit to share with the public. According to him, the clip has “been accepted thrice over to be fabricated and morphed”. By whom exactly, he does not say.
As any audio-visual professional knows, it is very difficult to morph a few seconds of video footage, let alone 12 minutes of it (though many man-weeks of teamwork could possibly do it), but it is very easy to detect morphing. So, if Singhvi wants his case in his own defence to gain credence, the Government should have no objection to a forensic test being conducted to determine the authenticity of the CD, which is what the opposition BJP has demanded.
In the meantime, we at Open sought to examine exactly what it takes to morph such a video. It is not easy at all. The format we use in India, PAL (common in Europe and most parts of Asia), delivers video images at a rate of 25 frames per second. So, to morph a video, each of those 25 frames would have to be individually tampered every second. In the case of close circuit TV, the rate could be much lower: 10-15 frames per second. It would be easier. But still, to ensure that it looks authentic, it would have to be done in fine detail. In addition, one would need two sets of stock images shot from precisely the same angles for them to be morphed together.
Such an exercise in Delhi could cost anything from Rs 75,000 to Rs 3 lakh per minute, depending on the detailing involved in the morphing and desperation of the client. In Mumbai, video units charge on a per-second basis, a figure that ranges from Rs 5,000 to Rs 10,000. There exist dozens of software packages that can do it, though, Maya and Cinema 3D/4D being the most popular and talked about.
That is just one part of the morphing. The trickier part is the audio track. The technical term for it is voice ‘transformation’ or ‘conversion’, for which the ‘source’ speaker’s utterances need to be changed in a way that the words appear to be spoken by a ‘target’ speaker.
There are three inter-dependent issues that must be resolved before a voice morphing system can be activated. A research project on voice morphing by Cambridge University’s Department of Engineering lists these as follows: ‘Firstly, it is important to develop a mathematical model to represent the speech signal so that the synthetic speech can be regenerated and prosody can be manipulated without artefacts. Secondly, the various acoustic cues which enable humans to identify speakers must be identified and extracted. Thirdly, the type of conversion function and the method of training and applying the conversion function must be decided.’ it’s tough work.
Arjun Bhagat, CEO of Imak News and Entertainment, which produces animation and special effect videos, explains the effort thus: “Morphing is tricky, as dealing with both aspects of video and audio makes it highly complicated and tedious.” In getting the sequence of events right, while the backgrounds and space dimensions must obviously be made to match, fixing the soundtrack is a particularly critical task. “Voice morphing is far more complicated,” he says. Once you get the frequency and pitch right, you have to worry about idiosyncratic pauses and elocution habits. To match all this is often practically impossible unless both the ‘target’ and ‘source’ speakers cooperate—as is the case with film dubbing.
A senior functionary at a top government forensic organisation calls the Singhvi episode a case of “dancing in the dark”. This is ironic because any government lab could confirm within hours whether the clip is a fabrication or not. He has seen the video and disputes Singhvi’s claim that “some sections of the print and visual media are spreading a falsehood simply by repetition and hearsay that there is [a] reference in the CD to the promise of any post”. In Singhvi’s words, “No one has heard any such reference in the CD. There is none simply because it does not exist.” If there is nothing incriminating on the CD’s audio track, asks the forensic expert, then why would ‘people inimically opposed’ to Singhvi have taken the trouble of putting out a morphed clip at all?