technische universit¨at ilmenau fakult¨at f ur ...midas1.e-technik.tu-ilmenau.de/~webkn/... ·...
Post on 01-Feb-2021
2 Views
Preview:
TRANSCRIPT
-
Technische Universität Ilmenau
Fakultät für Elektrotechnik und Informationstechnik
Diplomarbeit
Further development of VoIP softphonebased on ’Microsoft RTC Client API’
vorgelegt von: Carla Garćıa Sánchez
eingereicht am: 15. 11. 2006
geboren am:
Studiengang: Elektrotechnik und Informationstechnik
Anfertigung im Fachgebiet: Kommunikationsnetze
Fakultät für Elektrotechnik und Informationstechnik
Verantwortlicher Professor: Prof. Dr. rer. nat. habil. Jochen Seitz
Wissenschaftlicher Betreuer: Dipl.-Ing. Yevgeniy Yeryomin
-
Thanksgiving
Many people have helped me in one way or another during the course of this project.
Through these lines, I would like to express to them my most sincere gratitude.
To my professors, thank you for guiding and advising me at any moment. Every
suggestion has been constantly useful to improve this work. I appreciate all the support
from the personnel of the department of Communication Networks.
To my family and friends, thank you for your unconditional support, for encouraging
me in the hardest and most stressful moments. I appreciate that you have been there
for me and trusted me. Especially, I want to show my gratefulness to my roommates
and close friends in Ilmenau, because they have been sharing the everyday life with
me these last months.
Finally, I would like to thank TU - Ilmenau for allowing me to develop this project.
Once again, thank you everyone.
-
Abstract
In the time being, VoIP has become a widespread technology because enhances real-
time communication making it easier and more natural, regardless where people are
located. Voice over Internet Protocol (VoIP), like its name says, is a technology that
enables voice communication over the network.
This project intends to achieve the further development of a VoIP softphone based on
SIP that was implemented as part of a PhD thesis in the department of Communication
Networks. One of the aims of the project is to study the availability of this technology
on a mobile environment and the adaptation of this softphone to mobile devices.
A softphone is a software used to establish telephone calls from one computer to
other softphones or conventional telephones making use of VoIP technology. Besides, it
supports additional functionalities that can help and facilitate exclusive services to the
final user that would not be possible with the current telephone network; for example,
location of users independently of where they are connected or multiple videoconference
calls.
Before beginning with the development of the software application, it is essential to
understand the operation and the structure of softphones based on Session Initiation
Protocol (SIP), a protocol responsible of the establishment of the VoIP session between
users. For that purpose, the first part of this project consists in a survey about VoIP
technology and the protocols related to the VoIP environment, such as Session Initia-
tion Protocol, Session Description Protocol (SDP) and Real-time Transport Protocol
(RTP).
Nowadays, there are many types of softphones running on diverse operating systems
and programmed in different languages. Although they must follow the same basic
structure, they can be totally differentiated because of the extra features they provide
and the platform on which they are built. In this case, this application uses Microsoft
RTC Client API, that supplies the libraries and interfaces required to implement the
functionalities of the VoIP protocols previously mentioned.
Some of the new features that will be added to this software application are:
• Management of the contact list: It will allow users to storage information abouttheir contacts and access to it easily. Furthermore, it informs users about the
presence availability of their buddies.
-
• Videoconference call: In order to improve people communications, multimediacalls with audio and video become more real.
Although only a few functionalities are going to be developed, the capabilities of the
softphone could be increased by adding new ones in function of future people needs
and communication requirements.
-
Contents i
Contents
1 VoIP Technology based on SIP 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 VoIP Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Types of VoIP calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 VoIP protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6.1 Session Initiation Protocol (SIP) . . . . . . . . . . . . . . . . . 4
1.6.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 4
1.6.1.2 Protocol Design . . . . . . . . . . . . . . . . . . . . . . 4
1.6.1.3 SIP Clients and Servers . . . . . . . . . . . . . . . . . 5
1.6.1.4 SIP Messages . . . . . . . . . . . . . . . . . . . . . . . 7
1.6.2 Session Description Protocol (SDP) . . . . . . . . . . . . . . . . 10
1.6.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 10
1.6.2.2 Operation . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6.3 Real-time Transport Protocol (RTP) . . . . . . . . . . . . . . . 12
1.6.3.1 Real-time Transport Control Protocol (RTCP) . . . . 13
1.7 VoIP Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7.1 VoIP Clients running on different OS . . . . . . . . . . . . . . . 17
1.7.2 VoIP Clients for mobile devices . . . . . . . . . . . . . . . . . . 20
1.7.3 Structure and operation of softphones . . . . . . . . . . . . . . . 21
1.7.3.1 Registration procedure . . . . . . . . . . . . . . . . . . 23
1.7.3.2 Multimedia session establishment . . . . . . . . . . . . 26
1.7.4 Softphones for Windows Mobile OS . . . . . . . . . . . . . . . . 31
1.7.5 OS for mobile devices . . . . . . . . . . . . . . . . . . . . . . . . 32
Diplomarbeit Carla Garćıa Sánchez
-
Contents ii
2 Microsoft RTC Client API 34
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2 Object Model Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4 .NET Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.2 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.3 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3 Development of VoIP softphone for Windows 2000/XP 39
3.1 Understanding the code source . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 New functionalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.1 Volume bar for microphone and speakers . . . . . . . . . . . . . 40
3.2.2 Sending DTMF signals . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.3 Addition of videoconference . . . . . . . . . . . . . . . . . . . . 41
3.2.4 Contact List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.5 Encryption of media . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3 Testing the program and results . . . . . . . . . . . . . . . . . . . . . . 44
3.3.1 Volume bar for microphone and speakers . . . . . . . . . . . . . 45
3.3.2 Sending DTMF signals . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.3 Addition of videoconference . . . . . . . . . . . . . . . . . . . . 47
3.3.4 Contact List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3.5 Encryption of media . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4 Software tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4 Adaptation of the VoIP softphone for mobile devices 55
5 UML Structure 58
5.1 Class diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 Use case diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3 Sequence diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.4 State diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.4.1 Buddy state diagram . . . . . . . . . . . . . . . . . . . . . . . . 74
5.4.2 Watcher state diagram . . . . . . . . . . . . . . . . . . . . . . . 75
5.4.3 Session state diagram . . . . . . . . . . . . . . . . . . . . . . . . 75
5.4.4 Client state diagram . . . . . . . . . . . . . . . . . . . . . . . . 76
Diplomarbeit Carla Garćıa Sánchez
-
Contents iii
6 Getting Started 79
6.1 Software requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2 Getting an account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.3 Description of Graphical User Interface . . . . . . . . . . . . . . . . . . 80
A UML Diagrams 83
A.1 Class diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
A.2 Use case diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
A.3 Sequence diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
A.4 Buddy state diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
A.5 Watcher state diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
A.6 Session state diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
A.7 Client state diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Bibliography 91
List of Figures 92
List of Tables 93
List of Abbreviations and Symbols 94
Thesis of Diplomarbeit 97
Erklärung 98
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 1
1 VoIP Technology based on SIP
1.1 Introduction
VoIP (Voice over Internet Protocol) is simply the transmission of voice traffic over
IP - based networks. It is also called IP Telephony, Internet telephony, Broadband
telephony or Digital Phone. Companies providing VoIP service are usually known
as VoIP providers, and protocols used to route voice signals over the IP network are
identified as VoIP protocols. Although the Internet Protocol (IP) was originally de-
signed for data networking, the success of IP in becoming a world standard for it has
contributed to its use to voice networking.
VoIP uses a broadband internet connection for routing telephone calls, as opposed
to conventional switching and fibre optic alternatives. This process provides lower cost
for communication consumers. Maybe the most interesting point of the technology for
the user is that the current infrastructure is not needed to be reconfigured. The only
requirements are to adapt the internet functionality and a conventional phone into one
single service with software and hardware support.
1.2 VoIP Features
The biggest advantage of VoIP is that the customers can make and receive calls from
anywhere in the world where a broadband internet connection is available without
changing their phone number. This is known as mobility. It is not necessary to have
multiple numbers (office, home, mobile, and so on) from the same person because the
calls can be automatically routed to the VoIP phone where the user is registered. The
customers can take their IP phones with them on national and international trips and
still can manage to access what is essentially an individual’s domestic phone line.
On the other hand there are the softphones, which are a software application that
loads the VoIP services onto the desktop or laptop. Some even simulate an interface
that looks like a telephone, with which you can place VoIP calls to anybody around
the world, through a standard broadband connection.
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 2
Most VoIP services come with the caller id, call waiting, call transfer, repeat dialling,
or multi-conference call features. For additional features such as call filtering, forward-
ing a call, or sending calls directly to the voice mail, the service provider may assess an
additional fee. Most VoIP services also allow the user to check his/her voicemail over
the web or attach messages to an e-mail that is sent to his/her PDA or PC. The facil-
ities and components provided by VoIP phone system suppliers and service operators
may vary in significant ways because not all of them support the same functionalities.
1.3 Advantages
Since calls can be placed across the Internet, using the Internet connection for both
data traffic and voice calls allows consumers to save amounts of money. Thereby,
the major reason to change to VoIP technology for telephone service could be cost
reduction, for instance, the cost of the call is independently of the destination place,
so there is no extra charge for long distances.
VoIP is able to provide some additional features which make this technology even
more attractive and may be difficult to achieve with conventional telecommunication
companies, such as:
• Incoming phone calls can be automatically routed to your VoIP phone, regardlessof where you are connected to the network.
• Call center agents using VoIP phones can work from anywhere with a sufficientlyfast and stable Internet connection.
• Other features: multi - conference call, call forwarding, automatic redial, callerID, and so forth.
1.4 Types of VoIP calls
There are three techniques of connecting to a VoIP network:
• Using a VoIP telephone.
• Using a conventional telephone with a VoIP adapter.
• Using a computer with speakers and a microphone.
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 3
VoIP telephone calls are routed to other VoIP devices or to normal telephones on
the PSTN (Public Telephone Switch Network). Depending on the device, there are
two types of VoIP calls:
• PC - to - Phone call: from a VoIP device to a conventional telephone.
• PC - to - PC call: from a VoIP device to another VoIP device.
• Phone - to - PC call: from a conventional telephone to a VoIP device.
• Phone - to - Phone call: from a conventional phone to another conventionalphone.
Note that a VoIP device may not be a PC.
1.5 Operation
The most common way VoIP works is that the end user establishes a high speed broad-
band connection, using a router and a VoIP gateway. Instead of a standard telephone
line, the router sends the telephone calls over an internet connection. The VoIP gate-
way, placed somewhere in direct proximity of the connected Internet is responsible of
connecting the VoIP network with the PSTN network. All the transmission data (SIP
signalling, audio/video data and so on) are divided into smaller pieces called packets,
before sending it over the internet. These packets are sent to their final destination
and instructions for bringing back into an understandable form are embedded in them.
It then goes through a VoIP gateway where the packets are reconverted into the orig-
inal format utilizing a PSTN (Public Telephone Switch Network), thereby routing the
call to the number the caller has dialled blending old technology and high technology
delivery in a seamless and instantaneous way.
1.6 VoIP protocols
In this point, the main protocols required to implement a VoIP softphone based on
SIP are described.
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 4
1.6.1 Session Initiation Protocol (SIP)
1.6.1.1 Introduction
Session Initiation Protocol (SIP) is an application-layer control protocol that can es-
tablish, modify, and terminate multimedia sessions (conferences) such as Internet tele-
phony calls. These sessions can include one or more participants, invite new par-
ticipants, add and remove media streams owing to SIP is a flexible and transparent
protocol that allows the addition of more features in existing sessions.
The prime signalling functions of the protocol are detailed below:
• Location of the end user to guarantee the communication regardless where he isplaced.
• Determination of the availability of the end user to establish a session.
• Determination of the media capabilities and allowance the media negotiationbetween the participants involved in the communication.
• Negotiation of the features supported by the end users.
• Modification of the parameters or features in an already established session.
SIP is not a service provider, whereas SIP presents signalling capabilities that can
perform different services. Consequently, SIP should work in concert with other proto-
cols in order to supply the requirements of the users. If spite of that, SIP functionality
and operation is completely independent of the rest of the protocols due to SIP is only
involved in the signalling portion of a communication session.
One obvious example is the operation of a VoIP call, where SIP is responsible for
supporting of the session, Real - time Transport Protocol (RTP) for delivering real -
time data, and Session Description Protocol (SDP) for describing multimedia sessions.
1.6.1.2 Protocol Design
SIP is a peer - to - peer protocol. It means that SIP qualities are defined in the
communicating endpoints, not in the network.
As it was explained previously, SIP is an application-layer protocol, following the
TCP/IP model. The protocol structure can be divided in four different logical levels:
• Low layer: it is entrusted with the syntax and encoding of the SIP messages.
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 5
Figure 1.1: A typical SIP network with gateways
• Second layer (Transport layer): it describes the sent requests and received re-sponses in the client and server sides that are transmitted over the network.
There is a transport layer in every SIP element.
• Third layer (Transaction process layer): it manages the concordance betweenthe requests and responses that have been transmitted using the transport layer,
considering also the possible retransmissions and timeouts.
• Upper layer (Transaction user): all the SIP elements, except the stateless proxy,are defined as a transaction user. It could be said that it is responsible for
analyzing and completing the tasks of the transaction process layer.
1.6.1.3 SIP Clients and Servers
There are five SIP entities whose behaviour is detailed as follows.
• User Agent Client (UAC): builds SIP request and sends them to the UAS.
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 6
• User Agent Server (UAS): receives and manages SIP request from the UACand prepares SIP responses.
• Stateless Proxy Server: gets requests from the transport layer and routes itto the next step using the message content, but without storing any information
related to that request. For that reason, it is unable to distinguish between an
original message and a retransmission. Stateless proxies do not provide any SIP
timers and cannot build provisional responses like 100 Trying or 180 Ringing.
• Stateful Proxy Server: develops a deeper analysis of the requests received thanthe stateless proxy. It verifies the validation of the request and the consignee,
routes the message and stores state information. Stateful proxies use timers
to determine if the message must be retransmitted in case of not receiving a
response. Furthermore, they can demand user agent authentication.
• Registrar Server: is a server that receives and handles REGISTER requests.The user information contained in these messages are validated (user agent au-
thentication is required) and used for detecting the user location in the network.
User agents send this type of requests periodically in order to update their loca-
tion information.
Figure 1.2: SIP clients and servers
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 7
1.6.1.4 SIP Messages
It is defined two kinds of SIP messages:
• Request: it is sent from the client to the server
• Response: it is sent from the server to the client
Besides, they differ in the syntax and type of fields that form the message.
There are defined six main requests (also called methods) in the SIP specification:
• INVITE: Invites a user to take part in a session.
• ACK: Acknowledges the reception of an INVITE request.
• BYE: Ends an existing session.
• CANCEL: Interrupts a current transaction.
• OPTIONS: Asks for information about a server’s capabilities.
• REGISTER: Informs about the user’s current location.
Here it is shown some examples of the exchange of the SIP messages:
ACK
user1 ProxyServer
INVITE
user2
INVITE
180 RINGING180 RINGING
OKOK
ACK
BYEBYE
OK
OK
Figure 1.3: Example of SIP INVITE
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 8
Figure 1.4: Example of SIP REGISTER
Supplementary requests have been defined in SIP extensions, like Session Initiation
Protocol (SIP)-Specific Event Notification (RFC 3265). This document describes how
UACs can subscribe to specific events, like presence of their contacts, and how they
receive the notification of these events.
Figure 1.5: Example of a SIP extension: SUBSCRIBE - NOTIFY
Each response message has a status code which is used to specify the significance
of the transaction. According to the first digit of the status code, SIP responses are
classified in six different groups or families:
1xx : Provisional - Informs about the status of a received request.
2xx : Success - Indicates that a request has been successfully processed.
3xx : Redirection - It is not possible to manage the request. The client must retrans-
mit or revise the request.
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 9
4xx : Client Error - The request has not succeeded because of a client error. Further
action needs to be taken according to the response like modifying the original
request.
5xx : Server Error - The request has not succeeded because of a server error and the
server is not able to process it.
6xx : Global Failure - The request cannot be processed. The client should not retry
it.
The most important header fields of a SIP message are:
• Request-URI: It should contain the value of the SIP URI in the To field (exceptin case of REGISTER request, which refers to the domain where the registrar
server is located).
• Via: It indicates the type of transport used for the transmission of the messageand the location where the response must be sent. There can be several Via
fields to route to packet to the next hop. This field must also contain a branch
parameter, which is an identifier for the transaction with the same value by both
UAC and server.
• To: It contains the SIP address of the request’s recipient. This address is a SIPURI.
• From: It contains the SIP address of the user who has sent the request (thesevalue is only the same as the To field in case of REGISTER request).
• Call-ID: It is a unique identifier for each call that allows the server to detectdelayed messages that have arrived out of order.
• CSeq: It contains a sequence number and the method name. This sequencenumber is incremented by one for each message request that is sent by the same
user. It allows detecting lost messages and maintaining the order.
• Contact: It contains a SIP URI of the user’s current location.
• Max-Forwards: It is an integer identifier used to limit the number of hops of arequest on the way to its destination. Its initial value is usually 70 to guarantee
the reception and it is decremented by one at each hop. If it reaches 0 before
arriving to its destination, the request is rejected.
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 10
• Content-Type: This field describes the media type of the message-body sentto the recipient. It is only present if the body is not empty. In this case, it will
be an application/sdp content-type indicating that the SIP message includes a
SDP packet with the session description.
• Content-Length: It contains the size of the message body sent to the recipientin decimal number of octets.
1.6.2 Session Description Protocol (SDP)
1.6.2.1 Introduction
In order to establish videoconferences, VoIP calls or other type of session, it is necessary
to communicate media capabilities, transport addresses and other session description
information to the final users.
SDP presents a standard representation that describes and provides this information
in such an understanding way to the participants that allows them to make a decision
about whether to participate in a session.
SDP does not provide any kind of transport method or negotiation parameters.
SDP is simply a system for session description. It does not incorporate a transport
protocol, and it can work in conjunction with different transport protocols as suitable.
One example could be SIP, which incorporates SDP in its messages.
An SDP session description must include the following information: IP address,
port number, media type and media encoding format. Moreover, SDP contains extra
information like subject of the session, start and stop times or contact information
about the session.
These are the most important header fields in a SDP packet:
• Session description
– v: It shows the version of the Session Description Protocol.
– o: It contains the originator of the session (username and user address) and
a session identifier.
– s: It is the session name.
– c: It contains connection information including network type, address type
and connection address.
– b: It specifies the proposed bandwidth to be used by the session or media.
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 11
• Time description
– t: It indicates the start and stop time for a session. If these values are zero,
the session is considered as permanent.
• Media description (repeated for each type of media)
– m: It contains media types, ports, transport protocol, media format...
– k: If the packet is transported over a secure and trusted channel, this field
is used to convey encryption keys.
– a: It defines different media attributes. Normally, there are many lines of
this kind of field.
1.6.2.2 Operation
This point describes the negotiation method between two participants to agree about
the corresponding parameters to establish a media session using SIP. This negotiation
method is known as offer/answer model because one participant offers a description of
his/her available media streams and the other participant answers to the offer. Both,
offer and answer, have to be a suitable SDP message, following the recommendations
in RFC 4566.
The offer must contain all the media streams he/she wants to use, including the IP
addresses and the ports to receive them. For each media stream, the type of RTP
payload and the codecs have to been specified. If the offer contains multiple formats
for one media stream, it means that all of them can be used during the session, but
they have to be listened in preference order. The other participant should use the type
of media with the highest position in the list, if it is possible.
The answer must contain a corresponding media stream for each stream in the offer,
indicating the IP addresses and the ports to receive them. Besides, it must inform
about what media streams and codecs are supported. If there are no media formats in
common for a single media stream, it must be rejected by setting the port to zero. If
there no media formats in common for any media stream, all the media session must
be rejected.
When the participant who sent the offer receives the answer, he/she must identify
the accepted streams and formats and can start sending and receiving media.
Since SIP allows modifying the parameters in an established session, both partici-
pants can generate a new offer at any time in order to update the session with a new
negotiation.
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 12
1.6.3 Real-time Transport Protocol (RTP)
RTP is an application-layer protocol which defines a standardized packet format for
delivering data with real-time characteristics, such as audio and video, over the Inter-
net. The services provided by RTP incorporate payload type identification, numbering
sequence, timestamping, and delivery monitoring. Although RTP does not guarantee
quality - of - service or time delivery by itself, it includes appropriate functionality for
the detection of some of the problems produced by the transmission in an unreliable
IP network such as packet loss, variable transport delay, out of sequence packet arrival
or asymmetric routing.
In an equivalent manner as it happens in SDP, RTP is not responsible for the packet
delivering, whereas it usually operates and relies on transport protocols like UDP
(User Datagram Protocol) or TCP (Transmission Control Protocol) to deal with this
functionality. Moreover, RTP packets are not able to be transmitted by themselves
over the network. They are usually encapsulated in UDP packets.
The Payload field of the RTP packet contains real-time data and the information
about it, like the source, size, format and so on, is transported in the header fields.
The complete header structure of a RTP packet is detailed below:
• Version (V): This field identifies the version of RTP.
• Padding (P): If the padding bit is set, the packet contains one or more additionalpadding octets at the end which are not part of the payload. Padding may
be needed by some encryption algorithms. Otherwise, padding should only be
applied, if it is needed, to the last packet.
• Extension (X): If the extension bit is set, the fixed header is followed by exactlyone header extension. This extension mechanism allows individual implementa-
tions to experiment with new payload format independent functions that require
additional information to be carried in the RTP data packet header. In any other
case, it may be ignored.
• CSRC count (CC): It contains the number of contribution count identifiersthat go behind the fixed header.
• Marker (M): It is used to carry specific profile information in some applications.
• Payload type (PT): It defines the RTP payload and its understanding by theapplication.
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 13
• Sequence number: It is used to detect lost or out of sequence packets andrestore the original source. This identifier is randomly selected with the trans-
mission of the first packet of a stream and then it value is incremented by one
for each RTP sent packet.
• Timestamp: It reflects the moment of sampling of the first byte in the RTPpayload. Several consecutive packages will have the equal timestamp value if they
are part of the same stream or data source. The delivering of audio/data packets
in a media session uses different channel and port transmission. For that rea-
son, this identifier is very important to allow the receiver to restore audio/video
data packets and, furthermore, to synchronize a complete videoconference, for
example.
• Synchronization Source (SSRC): The synchronization source is a randomnumber used to identify the source of the RTP stream for each RTP session. A
user can receive RTP packets from the same endpoint at the same time, but two
different synchronization sources will not have identical SSRC identifier in the
same session, and so, it will be possible to differentiate the original source of each
one.
• CSRC list: It identifies the contributing sources for the payload contained inthis packet. The maximum number of contributing sources that it allows to
recognize is 15.
RTP supports, but not provides, encryption of the media flows. Generally, it use
IPSec or SRTP.
It is said that RTP consists of two differentiated protocols:
• Real-time Transport Protocol (RTP): it conveys real-time data.
• Real-time Transport Control Protocol (RTCP): it contains informationregarding the quality of the RTP session and the participants in the session.
1.6.3.1 Real-time Transport Control Protocol (RTCP)
RTP Control Protocol (RTCP) is a communication protocol that provides control in-
formation and quality services associated with a data flow for a multimedia application.
It works in concert with transport and packed RTP, but it does not transport any data
by itself. This protocol gathers connection statistics and information about sent bytes,
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 14
lost packages, jitter and so forth. It is important to notice that RTCP by itself does
not offer any kind of authentication or flow coding.
The information provided by this protocol is used to control the flow and the conges-
tion in the network. For example, if the statistics show that there are a huge number
of lost packages, the sender can modify its transmissions limiting flow or changing the
format of the media stream to another one with low compression codec. RTCP packets
are also used to realize and determine problems on the network.
On the other hand, participants in a session use RTCP packet to exchange some
basic identity data, like the username and the domain that is using.
The types of RTCP packets are:
• SR: Sender report, for transmission and reception of statistics from participantsthat are active senders.
• RR: Receiver report, for reception of statistics from participants that are notactive senders.
• SDES: Source description items.
• BYE: Indicates end of participation.
• APP: Application of specific functions.
RTCP packet structure depends on the type of packet. The packet structure detailed
below corresponds to a sender report (SR) packet. The only difference between the
sender report (SR) and receiver report (RR) forms, besides the packet type code, is
that the sender report includes a 20-byte sender information section for use by active
senders. This kind of packet is more complex than the others and has greater number
of fields.
• Version: It identifies the version of RTP, which is the same in RTCP packets asin RTP data packets.
• Padding: This field has the same functionality as in RTP packets, but relatedto RTCP packet.
• Reception report count (RC): It defines the number of reception reportblocks contained in this packet. Its value can be zero.
• Packet type (PT): In this case, it contains the constant 200 to identify thatthis as an RTCP SR packet.
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 15
• Length: It is the length of this RTCP packet in 32-bit words minus one, includingthe header and any padding.
• SSRC: It is the synchronization source identifier for the originator of this SRpacket. It should make reference to the SSRC field of a RTP packet.
• Network Time Protocol (NTP) timestamp: It is used to indicate the wall-clock time (absolute date and time) when the report was sent in order to used
it in combination with timestamps returned in reception reports from other re-
ceivers to measure round-trip propagation to those receivers. It has two subfields:
most significant word (MSW) and least significant word (LSW).
• RTP timestamp: It corresponds to the same time as the NTP timestamp, butin the same units and with the same random offset as the RTP timestamps in
data packets. This timestamp may not be equal to the RTP timestamp in any
adjacent data packet. Rather, it must be calculated from the corresponding NTP
timestamp using the relationship between the RTP timestamp counter and real
time.
• Sender’s packet count: It contains the total number of RTP data packetstransmitted by the sender since the transmission started until the time this SR
packet was generated. This count should be reset if the sender changes its SSRC
identifier.
• Sender’s octet count: It defines the total number of payload octets transmittedin RTP data packets by the sender since the transmission started until the time
this SR packet was generated. This count should be reset if the sender changes
its SSRC identifier.
• Source identifier (SSRC): This SSRC identifier is the same SSRC field as theRTP packet source which this RTCP packet is related to.
• Fraction lost: It informs about the fraction of RTP packets from SSRC sourcethat has been lost since the previous SR packet was sent.
• Cumulative number of packets lost: It refers to the total number of RTPpacket from SSRC source that have been lost since starting transmission. This
number is calculated using the number of packets expected minus the number of
packets already received.
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 16
• Extended highest sequence number received: It contains the highest se-quence number received in a RTP data packet from SSRC source, and the most
significant 16 bits extend that sequence number with the corresponding count of
sequence number cycles (it is calculated according to an algorithm in Appendix
A.1 from RFC 3550).
• Interarrival jitter: It is an estimation of the statistical variance of the RTPdata packet interarrival time, measured in timestamp units and expressed as an
unsigned integer.
• Last SR timestamp (LSR): It contains the middle 32 bits out of 64 in theNTP timestamp received as part of the most recent RTCP sender report (SR)
packet from SSRC source. If no SR has been received yet, the field is set to zero.
• Delay since last SR (DLSR): It refers to the delay, expressed in units of1/65536 seconds, between the last SR packet received from SSRC source and the
sending of the new one. If no SR packet has been received yet from SSRC, the
DLSR field is set to zero.
1.7 VoIP Clients
Nowadays, there is a great amount of different VoIP clients. The following tables show
some examples of free use VoIP softphones for using in computer and some others for
mobile devices. There is not much information about VoIP clients for mobile devices
because of the fact that they have proprietary license.
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 17
1.7.1 VoIP Clients running on different OS
Application Operating
System
License Language Other
ATLSIP Linux, Windows MPL,
GPL,
LGPL
C++ It is written
using the Ac-
tive Template
Library.
Ekiga Linux, Mac OS
X, BSD
GNU/GPL C++
Eyeball Me-
ssenger
Linux and
uClinux, Win-
dows 2000/XP,
Windows Mo-
bile, Windows
CE
Proprietary It is based on
Eyeball Mes-
senger SDK.
It is available
in PC, PDA
and embedded
platforms.
FreeSWITCH Linux, Win-
dows, Max OS
X, BSD, Solaris
Open
source
C++
KCall Linux GNU/GPL It is a VoIP ap-
plication for the
KDE desktop
environment.
Kphone Linux GNU/GPL C++ It uses Qt.
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 18
Linphone Linux, Windows
XP
Freeware C It uses eXosip
(SIP user agent
library based
on libosip2),
mediastreamer2
(powerful li-
brary to make
audio/video
streaming and
processing)
and ortp (RTP
library).
Minisip Linux, Windows
XP
GNU/GPL It will be soon
available on
Pocket PC.
MjUA GNU/GPL Java It is based on
MjSIP stack.
OpenWengo Linux, Win-
dows, Mac OS
X
GNU/GPL C++
OpenZoep Windows GNU/GPL C++
PhoneGaim Linux, Windows GNU/GPL
PJSUA Linux, Win-
dows, Windows
CE/Mobile,
Mac OS X,
Symbian OS
GNU/GPL C++ It is based on
PJSIP stack.
SFLphone Linux GNU/GPL C++ It should be
portable BSD
operating sys-
tems
Shtoom Linux, Win-
dows, Mac OS
X
GNU/GPL Python
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 19
SIPCommuni-
cator
Linux, Win-
dows, Mac OS
X
GNU/GPL Java
sipXphone Linux, Windows GNU/GPL Java
TudoMais Windows GNU/GPL Java
Twinkle Linux GNU/GPL C++ It uses KDE li-
braries.
VMukti Windows Open
source
C# It is based on
.NET 3.0
WxCommuni-
cator
Windows
XP/2000
GNU/GPL C++ It is based on
sipXtapi client
library and
wxWidgets 2.8.4
GUI library.
XMeeting Mac OS X Open
source
YATE Linux , Windows GNU/GPL C++ It supports
scripting in
various pro-
gramming
languages (such
as embedded
PHP, Python
and Perl).
YeaPhone Linux GNU/GPL It is based on the
Linphone stack.
Zap Linux, Win-
dows, Mac
OS
Open
source
JavaScript It is based on
Mozilla.
Table 1.1: VoIP Clients running on different OS
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 20
1.7.2 VoIP Clients for mobile devices
Application Operating System Others
AGEphone Windows Mobile 5.0
for Pocket PC
It is based on microSIP
stack, developed in C/C++.
Articulation Palm OS 5.0 or
greater
BeWip Windows Mobile OS
CiceroPhone Windows Mobile 5.0,
Windows PPC2003,
Symbian OS
ExpressTalk Windows Pocket PC,
Windows Mobile OS
eyeP Phone Desktop Windows Pocket PC
2003
iFon Windows Mobile,
Windows CE 4.X
Microsoft Office Com-
municator Mobile
Windows Mobile 2003
SE for Pocket PC
smartphone, Mobile
5.0 for Pocket PC and
Smartphone
It is based on the user in-
terface of Microsoft Office
Communicator 2005 desk-
top client.
Microsoft Portrait Windows Mobile 5.0
Pocket PC
It is a research prototype
for mobile video communi-
cation.
MoviVoip Palm OS 5.0 or
greater
OnePhone Windows Mobile 5.0,
Sybian OS, uLinux
Mobile
SJPhone Windows Pocket PC
2003
Solegy Softphone It is based on their Servi-
cePDQ platform and using
opensourcesip and opensip-
stack.
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 21
speaQ Windows Mobile
5.0 PDA Edition,
Linux/Qtopia
The FirstHand Mobile
Console
Windows Mobile 5.0
PDA Edition
VOYP Palm OS
Woize Windows Mobile 5.0,
Windows Mobile 2003
for PocketPC
X-Pro Windows Mobile 2003
Table 1.2: VoIP Clients for mobile devices
1.7.3 Structure and operation of softphones
As a general definition, a softphone (English combination of software and telephone)
is software used to establish telephone calls from one computer to other softphones
or conventional telephones over the internet network. Thereby, it is part of a VoIP
environment and makes use of the protocols previously described, SIP, SDP and RTP.
Nowadays, there are many available implementations. These softphones can have
different license of use (closed proprietary software, freeware, open source, GPL/GNU),
system requirements, operating system or programming language, but their structure
and operation must follow the same fundamental guidelines.
In order to develop a VoIP softphone, we can choose between two principal methods:
• Using some libraries, platform or API, like RTC Client API from Microsoft,where all the necessary protocols are defined and implemented following their
RFC files.
• Programming step by step all the features, functionalities, requirements, para-meters, and so forth that are defined in the RFC files of each needed protocol.
In this case, if we are interested in implementing a VoIP softphone based on SIP, it
is necessary to take in consideration these documents:
• RFC 3261: Session Initiation Protocol (SIP)
• RFC 4566: Session Description Protocol (SDP)
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 22
• RFC 1889/3550: A Transport Protocol for Real-Time Applications (RTP -RTCP).
Independently of the method we choose to develop the softphone, at least it must
contain the following parts:
• SIP package: it should define all the required classes and methods to provideand manage SIP services.
• SDP package: it should define all the required classes and methods to provideand manage SDP services.
• RTP package: it should define all the required classes and methods to provideand manage RTP services. This package should also afford RTCP services.
• User interface: it should help the final user to interact with the sotfphone andemploy its functionalities, independently of how it is implemented.
These protocol packages must perform some methods to build, send, receive, analyze
and process packets. As it was explained, neither of these protocols provides a way
to be transmitted over the network, otherwise they are usually encapsulated in UDP
packets. Consequently, the softphone must also define some classes and methods to
send, receive, and process of IP/UDP packets, including the sockets and the ports
that are needed for the communication channels. Normally, it is used to have two
channels for SIP messages (sending and receiving) and other two channels for RTP
packets (sending and receiving).
The best way to analyze and understand the operation of the VoIP softphones based
on SIP is by means of some examples.
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 23
1.7.3.1 Registration procedure
In this example, the user carla1 wants to register in a registrar server. The following
picture shows the SIP message exchange between the client and the server.
Figure 1.6: SIP registration procedure
Firstly, the user sends a SIP REGISTER request to a registrar server. The SIP
package must build a SIP request including the header fields that has been already
explained in 1.6.1.4.
The SIP server receives the message and uses the information to manage the request.
Meanwhile, it sends a provisional response (100 TRYING) to the user in order to
indicate that it is performing some action and does not yet have a definitive response.
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 24
The UAC processes the TRYING response and waits for a new response from the
server.
After that, the UAC receives the response of the server, 401 UNAUTHORIZED.
This response indicates that the request requires the user to perform authentication
because it does not contain the proper credentials.
The UAC receives this message and process the information. It should rebuild the
SIP request adding an Authorization header field in the message. The Authorization
field value consists of credentials containing the authentication information of the UAC
for the action requested as well as parameters required in support of authentication
and replay protection.
The new message will have the form:
Request-Line: REGISTER sip:141.24.93.180 SIP/2.0
Message Header
Via: SIP/2.0/UDP 141.24.172.62:15966
Max-Forwards: 70
From: ;tag=6a09ca1e73c846b681c288ed49dbd071;epid=e04c9989ea
To:
Call-ID: 81f1170b12304217a90082a48324756a@141.24.172.62
CSeq: 2 REGISTER
Contact: ;methods=”INVITE, MESSAGE, INFO, SUBSCRIBE,
OPTIONS, BYE, CANCEL, NOTIFY, ACK, REFER”’
User-Agent: RTC/1.2.4949
Authorization: Digest username=”’carla1”’, realm=”’asterisk”’, algorithm=MD5, uri=”’sip:141.24.93.180”’,
nonce=”’6c839ae6”’, response=”’feea44258515c993945155793cc1c8d6”’
Event: registration
Allow-Events: presence
Content-Length: 0
In this message, the CSeq field value has been incremented and the new field Authoriza-
tion has been included. One more time, the server sends a provisional response while it is
analyzing the request.
Finally, the request is successful accepted and processed by the registrar server and it
answers with an OK response:
Status-Line: SIP/2.0 200 OK
Message Header
Via: SIP/2.0/UDP 141.24.172.62:15966;received=141.24.172.62
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 25
From: ;tag=6a09ca1e73c846b681c288ed49dbd071;epid=e04c9989ea
To: ;tag=as45ef369c
Call-ID: 81f1170b12304217a90082a48324756a@141.24.172.62
CSeq: 2 REGISTER
User-Agent: Asterisk PBX
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY
Expires: 120
Contact: ;expires=120
Date: Thu, 31 May 2007 07:34:12 GMT
Content-Length: 0
The UAC processes the new response and handles it. There is an expires parameter in the
Contact field. It indicates how long the registration is valid expressed in seconds. Within
the expiration interval, the UAC should send another REGISTER request in order to inform
the server about its location.
Although it is a complete and normal registration procedure, there are many other pos-
sibilities according to the SIP server responses and the UAC must be able to process and
manage each of them.
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 26
1.7.3.2 Multimedia session establishment
When a UAC wants to initiate a session, it originates an INVITE request. The INVITE
request asks a server to establish a session. The handshake of the SIP messages is shown in
the picture.
INVITE
407 PROXY AUTHENTICATION REQUIRED
ACK
TRYING
ACK
carla1 ProxyServer
INVITE
carla2
INVITE
180 RINGING180 RINGING
OKOK
ACK
BYEBYE
OK
OK
Figure 1.7: Multimedia SIP session establishment
In this example, the user carla1 wants to initiate a multimedia call with the user carla2.
The SIP package must build a SIP INVITE request similar to the REGISTER request
following what it was explained in 1.6.1.4. In the same way, the SDP package has to create
a SDP packet with the header fields deatiled in 1.6.2.1.
The complete message is sent. The proxy server receives the message and processes the
request. The request does not contain any Authorization field and the server requires client
authentication. For that reason, the server sends a 407 PROXY AUTHENTICATION RE-
QUIRED to inform the client.
The UAC receives this message and process the information. This response is very simi-
lar to the 401 UNAUTHORIZED. The UAC sends an ACK message and rebuilds the SIP
request adding an Authorization header field with the proper credentials of the client.
Request-Line: INVITE sip:carla2@iptel.org SIP/2.0
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 27
Message Header
Via: SIP/2.0/UDP 141.24.172.97:16580
Max-Forwards: 70
From: ”’carla1”’ ;tag=f7231a4fdbf547408da053a9f31b0fdb;epid=99274a963c
To:
Call-ID: 6440389c3d664742af8eec47772ec291@141.24.172.97
CSeq: 2 INVITE
Contact:
User-Agent: RTC/1.2
Proxy-Authorization: Digest username=”’carla1”’, realm=”’iptel.org”’, algorithm=md5, uri=”’sip:carla2@iptel.org”’,
nonce=”’465e8da1a386b50d9c5608de3da1961645aa7abd”’, response=”61ecb5b096f0de7146ec001b7f469f85”’
Content-Type: application/sdp
Content-Length: 679
Message body
Session Description Protocol
Session Description Protocol Version (v): 0
Owner/Creator, Session Id (o): - 0 0 IN IP4 141.24.172.97
Session Name (s): session
Connection Information (c): IN IP4 141.24.172.97
Bandwidth Information (b): CT:1000
Time Description, active time (t): 0 0
Media Description, name and address (m): audio 62410 RTP/AVP 97 111 112 6 0 8 4 5 3
101
Encryption Key (k): base64:I/EwJ93tvnk62iBdgpAUBAtqQDDacCxMqae5MDj1i4A
Media Attribute (a): rtpmap:97 red/8000
Media Attribute (a): rtpmap:111 SIREN/16000
Media Attribute (a): fmtp:111 bitrate=16000
Media Attribute (a): rtpmap:112 G7221/16000
Media Attribute (a): fmtp:112 bitrate=24000
Media Attribute (a): rtpmap:6 DVI4/16000
Media Attribute (a): rtpmap:0 PCMU/8000
Media Attribute (a): rtpmap:8 PCMA/8000
Media Attribute (a): rtpmap:4 G723/8000
Media Attribute (a): rtpmap:5 DVI4/8000
Media Attribute (a): rtpmap:3 GSM/8000
Media Attribute (a): rtpmap:101 telephone-event/8000
Media Attribute (a): fmtp:101 0-16
Media Attribute (a): encryption:optional
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 28
Media Description, name and address (m): video 45406 RTP/AVP 34 31
Encryption Key (k): base64:YQdcx0AUpMFh2+xW8A1qcZ8NTeN6qEEcjoyfejsVgmo
Media Attribute (a): rtpmap:34 H263/90000
Media Attribute (a): rtpmap:31 H261/90000
Media Attribute (a): encryption:optional
This message is received by the proxy server. It routes the message to the destination
user or through another proxy server. Normally, if the message is routed to another proxy
server, it sends a provisional response to the first proxy server and routes the message to the
destination user.
The UAC processes the TRYING response and waits for a new response from the server.
When the destination user receives the INVITE request, sends a provisional RINGING
message indicating that the message is being processed. Then, the proxy server forwards the
response using the information in the Via field.
The UAC processes the RINGING message and waits for a non provisional response from
the server. When the destination user finally accepts the call, it is sent an OK response to
its proxy server. It forwards the message using the Via field and this proxy server forwards
it again in the same way to the UAC that formulated the original request.
Status-Line: SIP/2.0 200 OK
Message Header
Via: SIP/2.0/UDP 141.24.172.97:16580;rport=1477
From: ”’carla1”’ ;tag=f7231a4fdbf547408da053a9f31b0fdb;epid=99274a963c
To: ;tag=ac93304b84c644cc88c52d415d14caac
Call-ID: 6440389c3d664742af8eec47772ec291@141.24.172.97
CSeq: 2 INVITE
Record-Route:
Record-Route:
Record-Route:
Contact:
User-Agent: RTC/1.2
Content-Type: application/sdp
Content-Length: 708
P-Behind-NAT: Yes
Message body
Session Description Protocol
Session Description Protocol Version (v): 0
Owner/Creator, Session Id (o): - 0 0 IN IP4 141.24.92.247
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 29
Session Name (s): session
Connection Information (c): IN IP4 213.192.59.66
Bandwidth Information (b): CT:1000
Time Description, active time (t): 0 0
Media Description, name and address (m): audio 6260 RTP/AVP 97 111 112 6 0 8 4 5 3 101
Encryption Key (k): base64:t7AMm5DBS7WjFacOTXt9B+vImF15vDxVPGVgL8fY5GY
Media Attribute (a): rtpmap:97 red/8000
Media Attribute (a): rtpmap:111 SIREN/16000
Media Attribute (a): fmtp:111 bitrate=16000
Media Attribute (a): rtpmap:112 G7221/16000
Media Attribute (a): fmtp:112 bitrate=24000
Media Attribute (a): rtpmap:6 DVI4/16000
Media Attribute (a): rtpmap:0 PCMU/8000
Media Attribute (a): rtpmap:8 PCMA/8000
Media Attribute (a): rtpmap:4 G723/8000
Media Attribute (a): rtpmap:5 DVI4/8000
Media Attribute (a): rtpmap:3 GSM/8000
Media Attribute (a): rtpmap:101 telephone-event/8000
Media Attribute (a): fmtp:101 0-16
Media Attribute (a): encryption:optional
Media Description, name and address (m): video 42648 RTP/AVP 34 31
Encryption Key (k): base64:rjR7lz3kmGVpahZErkninx1gTFaZMyNr+Y35W1pSHtg
Media Attribute (a): recvonly
Media Attribute (a): rtpmap:34 H263/90000
Media Attribute (a): rtpmap:31 H261/90000
Media Attribute (a): encryption:optional
Media Attribute (a): nortpproxy:yes
This response indicates that the session has been accepted, but maybe not in the origi-
nal way. The UAC must process the message, verify that it belongs to the original INVITE
request using the CSeq field and analyze again the content of the SDP packet. The INVITE
request sent from carla1 to carla2 indicated a multimedia session with audio and video,
but the OK response indicates that only the received video data and audio are supported
by the other user. In order to finish the establishment of the session with the corresponding
parameters and media type, the UAC must send an ACK message.
The task of the proxy servers is to facilitate the two UAC locating and contacting each
other. They should not storage any knowledge of the fact that there is a session established
between the users. Furthermore, once the ACK is received by the destination UAC, they can
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 30
start exchanging RTP packets. It is important to realize that media is usually transmitted
end-to-end and not through any proxy server.
According to the type of media and the rest of parameters present in the SDP session
description, the RTP package should provide the way to generate RTP packets and send
them through the network encapsulated in IP/UDP packets. The complete header structure
of a RTP packet was detailed in 1.6.3.
After the RTP header there must be the RTP Payload. This is an example of a real RTP
packet:
Real-Time Transport Protocol
10.. .... = Version: RFC 1889 Version (2)
..0. .... = Padding: False
...0 .... = Extension: False
.... 0000 = Contributing source identifiers count: 0
0... .... = Marker: False
Payload type: SIREN (111)
Sequence number: 33555
Timestamp: 3169076983
Synchronization Source identifier: 2281082149
Payload: 5994FCBD0BD53F49C59C69B1E9F73449C6D6DC1CC62A6294...
Since it was mentioned before, there is another protocol which works in concert with RTP,
RTP Control Protocol (RTCP), that provides control information and quality services asso-
ciated with a data flow for a multimedia application.
This is an example of a real RTCP SR packet related to the previous RTP packet which
contains all the header fields explained in 1.6.3.1.
Real-time Transport Control Protocol (Sender Report)
10.. .... = Version: RFC 1889 Version (2)
..0. .... = Padding: False
...0 0001 = Reception report count: 1
Packet type: Sender Report (200)
Length: 12
Sender SSRC: 2281082149
Timestamp, MSW: 3389590617 (0xca090c59)
Timestamp, LSW: 2918510592 (0xadf4f000)
’[MSW and LSW as NTP timestamp: May 31, 2007 08:56:57,6795 UTC]’
RTP timestamp: 3169077087
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 31
Sender’s packet count: 47
Sender’s octet count: 2280
Source 1
Identifier: 3436236073
SSRC contents
Fraction lost: 0 / 256
Cumulative number of packets lost: 16777213
Extended highest sequence number received: 52467
Sequence number cycles count: 0
Highest sequence number received: 52467
Interarrival jitter: 3
Last SR timestamp: 200748212 (0x0bf72cb4)
Delay since last SR timestamp: 29204
In a single media session, many RTP and RTCP packets are transmitted. These have been
only some little examples of its structure and operation.
The media session finishes when some of the UACs sends a BYE request. In this session,
user carla1 wants to end the call and its UAC creates and sends a BYE message. OK
message. The media session is finished with the reception of this message.
Summarizing, it is possible to say that the basic operation of an UAC consists in preparing
the communication channel and building, sending, receiving and processing packets of the
different protocols that are needed in a whole VoIP environment.
1.7.4 Softphones for Windows Mobile OS
In the previous point, the main protocols that are needed to develop a VoIP softphone and
their operation were explained. The structure of a softphone on Windows Mobile OS is the
same as in other operating systems. It means that it must implement or make use of SIP,
SDP and RTP packages in the same way as it was described in 1.7.3.
The specific characteristics for softphones running on Windows Mobile OS reside in the
system requirements. There is not much information about it because these softphones have
closed proprietary license, but some common specifications are enumerated below:
• Minimum Size Requirement: 64MB ROM, 32MB RAM
• ARM processor
• Audio codec support: G711
One example of this softphones is AGEphone, which is based on microSIP stack. These
are its system requirements:
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 32
• CPU: ARM-type CPU 200 MHz or above
• Memory: 64MB or above
• Free Disk Space: 600kb or above
• Connection: Up- and Downstream of each 29.2kbps or above
This protocol stack supports G.711 and GSM6.10 codecs and provides the next protocols:
• RFC3261: Session Initiation Protocol (SIP)
• RFC2327: Session Description Protocol (SDP)
• RFC1889: A Transport Protocol for Real-Time Applications (RTP)
• RFC2833: RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals
• RFC3489: STUN - Simple Traversal of User Datagram Protocol (UDP) Through Net-work Address Translators (NATs)
• RFC3581: An Extension to the SIP for Symmetric Response Routing
• UPnP: Universal Plug and Play IGD
1.7.5 OS for mobile devices
Symbian OS is an operating system produced by Symbian Ltd. It has been designed for
mobile devices with associated libraries, user interface frameworks and reference implemen-
tations of common tools. It runs exclusively on ARM (Advanced RISC Machine) processors.
Windows Mobile is a compact operating system combined with a suite of basic applications
for mobile devices based on the Microsoft Win32 API. Devices which run Windows Mobile
include Pocket PCs, Smartphones, and Portable Media Centers (portable media player de-
vices). It is designed to be somewhat similar to desktop versions of Windows.
Nokia OS (NOS) is an informal name for the operating system in many Nokia mobile
phones. There is no such product or trademark. Officially it is referred as ISA (Information
Source Adapter) platform. It is only available for Nokia’s internal use. It is not licensed to
anyone else yet. No direct API is provided either, but most ISA phones can be programmed
with J2ME.
Operating System Embedded (OSE) is a real-time embedded operating system created by
the Swedish firm ENEA. OSE uses signaling in the form of messages passed to and from
processes in the system. Messages are stored in a queue attached to each process.
BlackBerry OS runs on an Intel 80386 microprocessor and all devices include an embed-
ded RIM (Research In Motion) wireless modem for wireless data access. This OS supports
Diplomarbeit Carla Garćıa Sánchez
-
1 VoIP Technology based on SIP 33
multitasking and multithreaded applications. Developers familiarized with other operating
systems such as Windows and the MacOS will be at home in the BlackBerry environment. All
applications interact with the underlying operating system (and other applications) through
the exchange of event messages. As most operating systems, a C language API is provided
for direct access to the system.
Palm OS is a compact operating system developed and licensed by PalmSource, Inc. for
personal digital assistants (PDAs) manufactured by various licensees. It is designed to be
easy-to-use and similar to desktop operating systems such as Microsoft Windows. Palm OS
is combined with a suite of basic applications including address book, clock, note pad, sync,
memo viewer and security software. Palm OS licensees decide which applications are included
on their Palm OS devices. The applications are primarily coded in C/C++ and a Java Run
time Environment is also available for its platform.
Linux is a Unix-like computer operating system family that uses the Linux kernel. A Linux
system which includes system utilities and libraries from the GNU Project is sometimes
referred to as GNU/Linux. The methodical design of Linux made it possible to adapt it
to a wide range of computing platforms in spite of being originally developed for Intel 386
processors. Of particular interest in this context are the ARM based architectures, as many
embedded systems and mobile devices are powered by ARM processors. Linux is a prominent
example of free software and of open source development. Its underlying source code is
available for anyone to use, modify, and redistribute freely, and in some instances the entire
operating system consists of free/open source software.
It is said that Mobile Linux and Mobile Java become a power combination. While Linux
is evolving into a major standard for mobile device operating systems, Java is becoming
a standard at the software application level. The J2ME/MIDP specifications have been
adopted by all major mobile phone manufacturers. The MIDP (Mobile Information Device
Profile) is comprised of a set of Java APIs, that provides a J2ME (Java 2 Micro Edition)
runtime environment for mobile information devices.
Mac OSX is a proprietary line, graphical operating systems developed, marketed, and
sold by Apple Inc., the latest of which is pre-loaded on all currently shipping Macintosh
computers. Mac OSX is a Unix-like operating system. This operating system has been
developed for the handheld device iPhone and gives access to true desktop-class applications
and software, including rich HTML email, full-featured web browsing, and applications such
as calendar, text messaging, notes, and address book. iPhone is fully multi-tasking
Diplomarbeit Carla Garćıa Sánchez
-
2 Microsoft RTC Client API 34
2 Microsoft RTC Client API
2.1 Introduction
The Real-time Communications Client Application Programming Interface enables develop-
ers to build applications for integrated multimodal communications. It provides the necessary
structure and interfaces to establish PC-PC, PC-phone, or phone-phone calls, Instant Mes-
saging (IM), sharing application, and whiteboard sessions over the Internet. Furthermore,
multimedia sessions can be set up on PC-PC calls, and Presence information on a list of
contacts is also supported.
RTC Client API can be programmed with C++ or any other programming language
that can access COM components. This includes .NET languages, such as Microsoft Visual
Basic .NET and Microsoft Visual C#, which can access the RTC Client API through COM
interoperability.
The main functionalities supported by the RTC Client API are:
• Registration and provisioning
• Publishing presence
• Contact management
• Polling presence
• Instant Messaging
• Multimedia calls
• Call control
• Session negotiation
• User search
• Authentication
• Signalling privacy
• Media privacy
Diplomarbeit Carla Garćıa Sánchez
-
2 Microsoft RTC Client API 35
2.2 Object Model Overview
The basic coding model for RTC is COM (Component Object Model). The main objects
used for communication in RTC are Client, Session, Profile, Participant, Buddy and Watcher
objects, and the interfaces used to create and manage them are IRTCClient, IRTCSession,
IRTCParticipant, IRTCProfile, IRTCBuddy, and IRTCWatcher, respectively.
Figure 2.1: RTC Client COM Objects
The client object is the basis of the RTC Client. It establishes the session types and the
session parameters, the preferred audio and video devices and other media capabilities. This
object is necessary to construct the rest of the objects.
The session object is used to manage all the tasks related to the real-time session such
as: initiating, answering, or terminating sessions, adding or removing participants, adding
security media or storing information about media types. There are four kinds of sessions:
PC-to-PC, PC-to-phone, phone-to-phone, and instant messaging.
The profile object provides a way to get information from a profile user. This profile
includes information about client account (username, password, sip server), supported session
types and capabilities, authentication, transport protocol and so forth. After initializing
RTC, the client application creates and enables a profile.
The participant object contains all the information and methods associated with users
who take part in a session. Each of these users is called a ’”participant”’ and is represented
by a different participant object.
Diplomarbeit Carla Garćıa Sánchez
-
2 Microsoft RTC Client API 36
The buddy object is used to get and put information about the user contacts. It provides
data like the name or the status of the contact. This object is created when a user adds a
new contact to his contact list.
The watcher object is used to get and put information about the state of a watcher. When
a user adds a new buddy, this buddy creates an object watcher of the user in order to maintain
information about his presence.
The buddy and the watcher objects are used to manage the presence information.
2.3 Architecture
To provide its functionality, the RTC Client API uses industry standard protocols like:
• Session Initiation Protocol (SIP)
• Session Description Protocol (SDP)
• Real-time Transport Protocol (RTP)
• Public Switched Telephone Network/Internet (PINT)
2.4 .NET Platform
2.4.1 Introduction
.NET is a software platform that connects information, systems, people and devices. .NET
Platform connects a great variety of technologies of personal use and businesses, of cellular
telephones to corporative servants, allowing the access to important information, where and
when they are needed. Developed with base on the standards of Services Web XML, .NET
allows the systems and applications (new or existing) to connect their data and transactions
independently of the version of the operating system, type of computer or mobile device that
is utilized, or the programming language used to create it.
Code written on the .NET Framework platform is called managed code. Regardless of
which .NET language is employed, the output of the language compiler is a representation
of the same logic in an intermediate language named CIL (Common Intermediate Language)
or MSIL (Microsoft Intermediate Language). The programming languages that can be used
in the .NET platform are C#, C++, Visual Basic .NET, J#, JScript .NET, Windows Pow-
erShell, IronPython, F#.
Diplomarbeit Carla Garćıa Sánchez
-
2 Microsoft RTC Client API 37
2.4.2 Operation
There are three main points on which .NET platform bases its mode of operation:
• .NET languages, that have been previously enumerated.
• Base Class Library (BCL), which is a library of types available to all .NET languagesand provides a lot of classes with a huge number of common functions, including
file reading and writing, graphic rendering, database interaction and XML document
manipulation.
• Common Language Runtime (CLR), which is explained below.
Figure 2.2: Overview of the Common Language Infrastructure
Diplomarbeit Carla Garćıa Sánchez
-
2 Microsoft RTC Client API 38
The most important component of the .NET Framework is the Common Language In-
frastructure. The CLI is responsible for providing a language platform for application de-
velopment and execution, including components for exception handling, security, interoper-
ability, and so forth. Microsoft’s implementation of the CLI is called the Common Language
Runtime (CLR). The CLR is composed of four primary parts:
• Common Type System (CTS)
• Common Language Specification (CLS)
• Just-In-Time Compiler (JIT)
• Virtual Execution System (VES)
Managed code is compiled down to a combination of MSIL and metadata. These are
combined into a VES file, which can then be executed on any CLR-capable machine. When
you run this executable, the JIT starts compiling the CIL down to native code. The result
is that all .NET Framework components run as native code. Code that requires the CLR at
run-time in order to execute is referred to as managed code. The purpose of the CLR is to
control the execution of the code that runs on the .NET Framework.
2.4.3 Advantages
For software developers, the .NET Framework is an important change. It offers some capa-
bilities and responsibilities that had previously been provided individually by programming
languages and tools from various sources. The incorporation of the features into the operating
system becomes in a great number of advantages, including:
• Assuring the availability of framework features to all programs written in any of the.NET languages.
• Providing to programmers a common mean of accessing framework features, regardlessof programming language.
• Guarantees of a common behaviour within the framework, regardless of programminglanguage.
• Allowing the operating system to provide some guarantees of program behaviour that,otherwise, it could not offer.
• Reducing the complexity and limitations of program-to-program communication, evenwhen those programs are written in different .NET languages.
Diplomarbeit Carla Garćıa Sánchez
-
3 Development of VoIP softphone for Windows 2000/XP 39
3 Development of VoIP softphone for
Windows 2000/XP
3.1 Understanding the code source
Before continue developing the softphone, it is essential to understand and identify how it has
been built. It means that it is necessary knowing the softphone structure and the different
functionalities that are already implemented.
Analyzing the code source of the application and its behaviour in execution it is possible
to identify the following operations:
• Initializing RTC Client object: creates the client object.
• Listening on RTC Events: allows the client to determine which specific events theapplication needs and ignore the rest.
• Creating and enabling a profile: creates a profile with the configuration parametersof the client object in order to register a user in a server and creates the profile object.
• Handling events: identifies and controls incoming events.
• Starting a session and making a call: configures the type of session, adds aparticipant and creates the session object.
• Answering a call: manages an incoming call.
• Terminate a call: finishes an existing session.
• Disabling profile: deregisters a user and disables the profile.
• Shut down client: stops the operation of the client object and disables the rest ofexisting objects.
These basic steps compose the softphone framework and permit the correct operation of its
main purpose: the transmission of voice over internet by means of a SIP session establishment.
Diplomarbeit Carla Garćıa Sánchez
-
3 Development of VoIP softphone for Windows 2000/XP 40
3.2 New functionalities
Nowadays, a typical VoIP softphone includes some features that are not strictly related with
its prime operation, but they add new useful capabilities in order to render some facilities
or services to the user. For that purpose, five new functionalities have been added to the
softphone. Each of them is explained as follows.
3.2.1 Volume bar for microphone and speakers
This functionality allows the user to configure and adjust the audio settings. In order to
increase or decrease the volume level of the microphone or speakers, it is only needed to
move the Microphone Volume or Speakers Volume trackbar, respectively.
Furthermore, the Audio and Video Tuning Wizard help the user to verify that his camera,
speakers, and microphone are working properly. Before using the Wizard, it is important to
perform the following:
• Close all other programs that show video or play or record sound.
• Make sure that the camera, speakers, and microphone are plugged in and turned on.
These functionalities are implemented by the client object and the methods used are
included in the RTCClientClass class.
• set volume(RTC AUDIO DEVICE enDevice,long lVolume), where the input parame-ters are the audio media type (microphone or speakers) and the volume level.
• InvokeTuningWizard()
3.2.2 Sending DTMF signals
Dual tone multi-frequency is a system of signal tones used in telecommunications. When
the user presses a dial-pad button corresponding to a digit, two tones of specific frequencies
are sent. The receiver, normally a switching centre, can decode and detect which digit was
pressed. The tones are divided into two groups (low and high) into the voice frequency
band, and each DTMF signal uses one from each group. These signals are used in different
applications including voice mail, help desks, telephone banking, and so forth, to select some
configuration options or manage remote control systems, for instance.
Diplomarbeit Carla Garćıa Sánchez
-
3 Development of VoIP softphone for Windows 2000/XP 41
The following table shows the frequencies associated with each decimal digit:
Button or Digit Low frequency (Hz) High frequency (Hz)1 697 12092 697 13363 697 14774 770 12095 770 13366 770 14777 852 12098 852 13369 852 14770 941 1336
Table 3.1: DTMF frequencies
This functionality is provided by the client object. The method used is SendDTMF(RTC DTMF
enDTMF ), included in the RTCClientClass class. The input parameter is an enumeration
that specifies which DTMF should be sent.
This method sends a DTMF to the active session and plays a feedback tone to the RTC
default audio device.
3.2.3 Addition of videoconference
Videoconference calls can be used in a great amount of different situations, which is one of
the reasons the technology is so popular. Although a lot of people use videoconference in
a recreational sense, general uses for videoconference include business meetings, educational
training and collaboration among health officials. In fact, videoconference has been used
in a huge variety of fields like the followings: telemedicine, telecommunications, education,
surveillance, security, emergency response, and so on.
Perhaps the biggest benefit videoconference offers is the ability to meet with people in
remote locations without problem of time, distance or money. It can be use to keep in touch
with the entire world without going out home.
’A picture says a thousand words’. Videoconference does not replace real life meetings, but
enhances ’face-to-face’ communication making it easier and more natural, regardless where
people are located.
After establishing the audio/video session between two participants, the client object
processes received and sent video data. This object gets incoming and outgoing video stream
and shows each of them in a different video window. The method used is get IVideoWindow(RTC VIDEO DEVICE
enDevice, out IVideoWindow VWindow), included in the RTCClientClass class. The input
parameter is an enumeration that specifies the video device (receive/preview); the output
parameter is referred to an interface to control video window properties.
Diplomarbeit Carla Garćıa Sánchez
-
3 Development of VoIP softphone for Windows 2000/XP 42
3.2.4 Contact List
The contact list, also called address book, is a feature which allows users to storage locally
friends’ personal information. Besides, it lets users to know if their contacts are online or not.
Users can call their friends only with some few clicks. It is easy, speedy and comfortable. All
the information about the user’s buddies is persisted in a file on the user’s computer.
The service that makes it possible is the presence information service. It is responsible
for updating contact’s presence status and notifying user’s status. The calls will be done
through a registrar server that maintains current location information of the contacts.
The first stage consists in registering the user on the SIP server and enabling presence.
The presence service can be enabled before registering user’s profile on the server. The main
steps are: create profile - enable presence - set presence status - enable profile.
Once the profile is registered and presence is enabled, adding a new contact to the address
book is simple. The IRTCClientPresence interface provides methods add a buddy, remove a
buddy, enumerate watchers, set local presence status, and so forth.
If the buddy object is successfully created, using the IRTCBuddy interface the client object
will be able to get the buddy’s presentity URI, name of the buddy, buddy’s SIP number, the
buddy’s status, and some other data associated with the buddy.
The contact list can be recovered by querying the client object using the IRTCClientPres-
ence interface. From this interface, the contacts can be enumerated by calling the Enumer-
ateBuddies method.
3.2.5 Encryption of media
It is indispensable to be aware of the risks using VoIP, especially in the case of telephony,
an application of vital development. People who combine telephony and computing, they
also promote security holes and dangers. The use of unsecured VoIP communications is a
great opportunity for undesirable activities of the hackers. Hackers record calls like audio
file, resend calls, make calls with false identification, generate busy tones or manipulate call
queues. There are many programs for that purposes available in internet. For that reason,
VoIP application must assure security. Confidentiality, integrity and authenticity of dates
must be guaranteed at any time.
In cryptography, encryption is the process of transforming information to make it unread-
able to anyone except those possessing special knowledge, usually referred to as a key.
It could be possible to think that encrypting media flows is sufficient to secure a VoIP
communication, but this concept is completely wrong. Some media encryption protocols, like
Secure Real-time Transport Protocol (SRTP), do not provide any method for key exchange
or key management and they use SIP signalling for this purpose. So, if SIP signalling is
not encrypted or protected by any mechanism, anyone could get this key. In conclusion, it
Diplomarbeit Carla Garćıa Sánchez
-
3 Development of VoIP softphone for Windows 2000/XP 43
is needed to encrypt any media associated with a session and all SIP traffic to guarantee a
secure VoIP communication.
SIP is not an easy protocol to secure. The encryption of the whole message would be the
best mean to assure security, however, SIP request and responses cannot be entirely encoded
because some message fields, like Via, need to be able to read and modify by, for example,
proxy servers. For that reason, it is recommended to use low-layer security mechanisms for
SIP because they work hop-by-hop. In these kinds of mechanisms, servers are authenticated,
so, the end users can be sure with whom they are communicating.
Transport or network layer security encrypts signaling traffic, guaranteeing message confi-
dentiality, integrity, and, sometimes, authentication. RFC 3261 documentation proposes two
ways for securing the transport and network layer: Internet Protocol Security (IPSec) and
Transport Layer Security (TLS).
IPSec is a set of network-layer protocols for securing Internet Protocol (IP) communica-
tions. IPSec also includes protocols for cryptographic key establishment. It can be used with
TC
top related