I've created a Presentation that goes over these points as well:
http://prezi.com/-29ebxieb4ek/copy-of-rtp-re-assembly/
*UPDATE*
I found this awesome work, using google's translate api, to transcode the audio to text:
http://cheateinstein.com/category-shell/using-google-voice-api-to-transcribe-audio/
I've now used this at the final end of the process, to verify the text heard is what is expected!!
I've been working on this VOIP/SIP automation framework for a few months now. I started with a Cucumber framework, and then added on with some VOIP/SIP specific tools like SIPP and SIPCLI.
I got to where the test harness' I built with these tools, would use Jenkins to push button (or on a schedule or build commit) drive traffic to a phone number... verify it reached it by acknowledgements sent back. But what if the phone number was going to the wrong destination, and sent back acknowledgements?
At that point I used TollFreeForwarding.com's technology to set a email alert as an endpoint on a phone number. For example, you call: 888-888-8888 and you get an IVR. you press 1, and are sent to a voicemail - you pass in audio and hang up. Then TollFreeForwarding.com emails the configured email on the account, the recording.
It was better, but it required a voicemail to email application at every end point. It also doesn't verify that audio actually occurred on the call. What if no audio played back? Or there was significant jitter to not understand it?
To further this testing, I started thinking of recording the call and using some sort of analysis of the recording to verify it's what was expected.
This is my first draft at answering that need. It can be improved. But it's a step in the right direction.
What I'm doing
- This automation dials a number, with a known IVR or greeting.
- It does a packet capture during the recording
- It filters out the RTP channels from the packet capture and then creates a wav out of the pcap file.
- Once there is a wav file, it runs diagnostics on it... generating some visual graphs like the image on this blog... but more importantly (and more useful) it generates audio information that I use as a footprint for the audio playback.
- This audio is also sent to google who transcribes it and sends me back the text which is compared to the expected string.
Tools used
- sipp to drive an automated command line sip call
- tshark (command line version of wireshark)
- jenkins (for the GUI to drive and schedule these tests)
- sox (linux based audio conversion and analysis tool)
- some shell scripting
How it Works
The test has a parent job, that kicks off two sub jobs. These sub jobs run simultaneously. One does a phone call to a phone number with a recording Greeting/IVR. The other job runs a shell script that maintains the test itself. The second job uses tshark to record the packets and filter the rtp, then uses sox to convert the raw audio to a wav and do some analysis on the wav.The Shell Script
First I set tshark to record for a specific duration, that I think will encompass the call:tshark -a duration:20 -w /jenkins/userContent/sip_1call.pcap
I assign a variable to a tsark task to scan the RTP packets and find the hex value for the RTP packets (I learned these three parts from a online tutorial, but lost the bookmark):
ssrc=$(tshark -n -r /jenkins/userContent/sip_1call.pcap -R rtp -T fields -e rtp.ssrc -Eseparator=, | sort -u | awk 'FNR ==1 {print}')
The above would return a hex value like:
0x344292302
Which is followed by:
sudo tshark -n -r /jenkins/userContent/sip_1call.pcap -R rtp -R "rtp.ssrc == $ssrc" -T fields -e rtp.payload | tee payloads
The above looks for that Hex value captured previously, and holds that as a variable, payload.
Finally, we have a for statement in the shell script to convert the payload value from above, to a raw audio file:
for payload in `cat payloads`; do IFS=:; for byte in $payload; do printf "\\x$byte" >> /jenkins/userContent/sip_1call.raw; done; done
At this point I had a raw audio file. I found a linux tool called sox that was a good fit for this conversion... so I installed it and added these lines into my script...
Sox is then invoked to convert the raw audio to a wav:
sox -t raw -r 8000 -v 4 -c 1 -U /jenkins/userContent/sip_1call.raw /jenkins/userContent/sip_1call.wav
Then I run a couple more Sox commands:
This one creates stats, which Jenkins captures in the log file of the test run:
sox /var/lib/jenkins/userContent/sip_audio_1call.wav -n stat
The stats generated will look like this:
Samples read: 15680 Length (seconds): 1.960000 Scaled by: 2147483647.0 Maximum amplitude: 0.425659 Minimum amplitude: -0.285034 Midline amplitude: 0.070313 Mean norm: 0.043354 Mean amplitude: -0.000055 RMS amplitude: 0.070984 Maximum delta: 0.243896 Minimum delta: 0.000000 Mean delta: 0.019919 RMS delta: 0.034190 Rough frequency: 613 Volume adjustment: 2.349
The two highlighted values seem to be consistent with the same audio. At this point, that's what the test assertion is based on. I have a better plan in the works for a future upgrade to this test. But for now, I'm using the rough frequency and max amplitude to determine the pass / fail criteria.
Is it perfect? No. It's potential for false negatives. The rough frequency *could* change, but so far it hasn't for the same audio I expect.
Spectograms
If your into spectrogram's (and who isn't?), then sox will also output one if you like, I end the shell script with this:
sox /jenkins/userContent/sip_1call.wav -n spectrogram -y 2 -l -o /jenkins/userContent/sip_1call.png
If anyone has any other tools that can pull out more data, please let me know.
The Upshot?
One shell script, called by Jenkins, running 3 tools gets this job done.Verify Audio via Speech To Text
A few people approached me and mentioned rough frequency may not remain constant as the test call goes through different hops. So I began to investigate this some more... I found this guy:http://cheateinstein.com/category-shell/using-google-voice-api-to-transcribe-audio/
he had created a way to use a shell script to send audio files to google for transcription.
I modified his script a little to work for my needs, and added a text assertion. If the text fails comparison then I exit the script with a error code, which forces jenkins to regard this as a total failure.
Here's the part I added to the bottom of my previous script:
echo "1 - Translate with SOX - Convert WAV to FLAC with 16000"
sox /jenkins/userContent/sip_audio_1call.wav input.flac rate 16k
echo "2 - Submit to Google Voice API"
wget -q -U "Mozilla/5.0" --post-file input.flac --header="Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" > output.ret
echo "3 - Extract recognized text"
cat output.ret | sed 's/.*utterance":"//' | sed 's/","confidence.*//' > output.txt
echo "4 - Display text"
a=`cat output.txt`
echo $a
b="tollfreeforwarding.com"
if [ "$a" = "tollfreeforwarding.com" ];
then
echo "Verified audio is tollfreeforwarding.com"
else
echo "FAIL audio is not tollfreeforwarding.com"
exit 666
fi;
In my scenario, I've seeded the phone greeting on the number that is called to be an announcement audio that says, "Toll Free Forwarding Dot Com" which google turns correctly to "tollfreeforwarding.com" and I validate against that.

