The process of VoIP
The way the voice is transformed in VoIP is very simple. Here are the steps how the voice is transferred.
1. First the analog voice is converted to the digital signals (bits)
2. In the second step the bits are compressed into a good format for transmission
3. then we have to insert our voice packets in data packets using a real-time protocol (typically RTP over UDP over IP)
4. We need a signaling protocol to call users: ITU-T H323 does that.
5. At RX we have to disassemble packets, extract data, then convert them to analog voice signals and send them to sound card (or phone)
6. All that must be done in a real time fashion because we cannot waiting for too long for a vocal answer! (see QoS section)
Base architecture
Voice-- ADC - Compression Algorithm - Assembling RTP in TCP/IP ----> |
<---- |
Voice -DAC - Decompress. Algorithm - diseases. RTP from TCP/IP
How it is converted Analog to Digital?
It is all done by a specific hardware. This is called Analog to Digital Converter ADC
Today every sound card allows you convert with 16 bit a band of 22050 Hz (for sampling it you need a freq of 44100 Hz for Nyquist Principle) obtaining a throughput of 2 bytes * 44100 (samples per second) = 88200 Bytes/s, 176.4 kBytes/s for stereo stream.
For VoIP we needn't such a throughput (176kBytes/s) to send voice packet: next we'll see other coding used for it.
Compression Algorithms
Now that we have digital data we may convert it to a standard format that could be quickly transmitted. PCM, Pulse Code Modulation, Standard ITU-T G.711
1. Throughput is 8000 Hz *8 bit = 64 kbit/s, as a typical digital phone line.
2. We represent each sample with 8 bit (having 256 possible values).
3. Voice bandwidth is 4 kHz, so sampling bandwidth has to be 8 kHz (for Nyquist).
4. In real application mu-law (North America) and a-law (Europe) variants are used which code analog signal a logarithmic scale using 12 or 13 bits instead of 8 bits (see Standard ITU-T G.711).
ADPCM, Adaptive differential PCM, Standard ITU-T G.726
It converts only the difference between the actual and the previous voice packet requiring 32 kbps (see Standard ITU-T G.726).
LD-CELP, Standard ITU-T G.728
CS-ACELP, Standard ITU-T G.729 and G.729a
MP-MLQ, Standard ITU-T G.723.1, 6.3kbps, Truespeech
ACELP, Standard ITU-T G.723.1, 5.3kbps, Truespeech
LPC-10, able to reach 2.5 kbps!!
This last protocol are the most important cause can guarantee a very low minimal band using source coding; also G.723.1 codecs have a very high MOS (Mean Opinion Score, used to measure voice fidelity) but attention to elaboration performance required by them, up to 26 MIPS!
RTP Real Time Transport Protocol
Now we have the raw data and we want to encapsulate it into TCP/IP stack. We follow the structure:
VoIP data packets
RTP
UDP
IP
I, II layers
VoIP data packets live in RTP (Real-Time Transport Protocol) packets which are inside UDP-IP packets.
Firstly, VoIP doesn't use TCP because it is too heavy for real time applications, so instead a UDP (datagram) is used.
Secondly, UDP has no control over the order in which packets arrive at the destination or how long it takes them to get there (datagram concept). Both of these are very important to overall voice quality (how well you can understand what the other person is saying) and conversation quality (how easy it is to carry out a conversation). RTP solves the problem enabling the receiver to put the packets back into the correct order and not wait too long for packets that have either lost their way or are taking too long to arrive (we don't need every single voice packet, but we need a continuous flow of many of them and ordered).
Real Time Transport Protocol
Where:
1. V indicates the version of RTP used
2. P indicates the padding, a byte not used at bottom packet to reach the parity packet dimension
3. X is the presence of the header extension
4. CC field is the number of CSRC identifiers following the fixed header. CSRC field are used, for example, in conference case.
5. M is a marker bit
6. PT payload type
RSVP
This protocol is part of a larger effort to enhance the current Internet architecture with supports for Quality of Service flows. The RSVP protocol is used by a host to request specific qualities of service from the network for particular application data streams or flows. RSVP is also used by routers to deliver quality-of-service (QoS) requests to all nodes along the path(s) of the flows and to establish and maintain state to provide the requested service. RSVP requests will generally result in resources being reserved in each node along the data path.
There are also other protocols used in VoIP, like RSVP, that can manage Quality of Service (QoS).
RSVP is a signaling protocol that requests a certain amount of bandwidth and latency in every network hop that supports it.
Quality of Service (QoS)
For the voice over internet protocol a real-time data streaming is required. Because for the smooth exchange of the data an interactive data voice exchange is required.
Now the call quality monitoring software specialist Telchemy (Suwanee, Ga.) has teamed with Swedish voice processing technology group Global IP Sound (GIPS) to develop and market speech processing and performance management solutions for VoIP equipment and service providers.
Unfortunately, TCP/IP cannot guarantee this kind of purpose, it just make a "best effort" to do it. So we need to introduce tricks and policies that could manage the packet flow in every route we cross. And we should a real-time data streaming.
So here are the way by which we can make the voice over internet protocol a real-time data stream,
1. TOS field in IP protocol to describe type of service: high values indicate low urgency while more and more low values bring us more and more real-time urgency
2. Queuing packets methods:
1. FIFO (First in First Out), the method that allows passing packets in arrive order.
2. WFQ (Weighted Fair Queuing), consisting in a fair passing of packets (for example, FTP cannot consume all available bandwidth), depending on kind of data flow, typically one packet for UDP and one for TCP in a fair fashion.
3. CQ (Custom Queuing), users can decide priority.
4. PQ (Priority Queuing), there is a number (typically 4) of queues with a priority level each one: first, packets in the first queue are sent, then (when first queue is empty) starts sending from the second one and so on.
5. CB-WFQ (Class Based Weighted Fair Queuing), like WFQ but, in addition, we have class concept (up to 64) and the bandwidth value associated for each one.
3. Shaping capability, that allows to limit the source to a fixed bandwidth in:
1. download
2. upload
4. Congestion Avoidance, like RED (Random Early Detection).
H323 Signaling Protocol
H323 protocol is used, for example, by Microsoft Net meeting to make VoIP calls.
This protocol allows a variety of elements talking each other:
1. Terminals, clients that initialize VoIP connection. Although terminals could talk together without anyone else, we need some additional elements for a scalable vision.
2. Gatekeepers, that essentially operate:
1. address translation service, to use names instead IP addresses
2. admission control, to allow or deny some hosts or some users
3. bandwidth management
3. Gateways, points of reference for conversion TCP/IP - PSTN.
4. Multipoint Control Units (MCUs) to provide conference.
5. Proxies Server also is used.
|