|
July 1, 2006
The Web-CTI Revolution
Voice-over-IP solutions are typically marketed and sold under a variety
of short-term premises, but the most significant potential for VoIP is
the long-term capability for integrating voice services into your data
services architecture. Simply put, once you have converted your telephony
services into packetized network services, the voice services that used
to operate outside the data network become just another distributed application,
with the same potential for integration and development as any other
data service.
Of course, there are several Computer-Telephony Integration (CTI) interfaces
that are already in widespread use across a variety of different platforms
(see Existing CTI Interfaces at bottom), although most of the legacy
APIs are not generally useful for modern networked applications, given
that they tend to be platform specific. Over the last few years, vendors
and advocates have been pitching SIP as a fresh solution to this space,
since it already uses a control syntax that is ostensibly platform-neutral,
and already provides raw access to the kind of functions that are most
often needed by business applications. But while SIP does work well for
many things, it also approaches the data-integration problem from the
wrong direction—it requires integrating your data-centric applications
into the voice network, while most of us want to bring the communications
services into our data-oriented network application space.
This is where Web service interfaces to voice networks offer the most
promise. Theoretically, wrapping the traditional communication services
into well-defined XML messages which are then transferred across standardized
SOAP protocols will allow organizations to bring telephony and messaging
services directly to their applications, without requiring the applications
to become peer devices on the telephony network. Better yet, Web services
also have the potential to provide an abstract interface into the telephony
service that is independent of the underlying telephony protocols (they
can provide interfaces to devices on SIP and H.323 networks, and even
PSTN networks), and applications only have to become peer devices of
those networks if the application itself needs to be voice-enabled.
More to the point, some IP-PBX vendors are already shipping Web Service
interfaces for their systems today, so this is potentially possible right
now. Unfortunately, most of what is available today is very rough first-generation
stuff, and the industry as a whole is many years away from seamless interoperability
between telephony systems and user applications via Web services interfaces.
However, there are a handful of vendors who are shipping products with
these interfaces today, and there is also some ongoing standards activity
that is worth watching. IT folk should be aware of these efforts and
use this information for planning purposes, but should not expect to
be able to buy cross-system, standardized solutions today.
The Promise
On the surface, Web service interfaces to traditional telephony systems
(referred to as “Web-CTI” in this article) don’t appear
to be all that different from existing interfaces. All of them provide
a collection of functions that computer-based applications can tap into,
and the only real visible difference is in the transport mechanisms.
For example, the platform-specific interfaces usually rely on local transports
(such as plain old RS-232 serial connections), while network-oriented
interfaces use network protocols to extend the interface across a data
network, and from a cursory examination the same appears to be true for
Web-CTI interfaces too.
However, the real difference with Web-CTI interfaces is in its additional
layering. Whereas the traditional interfaces merely extend the
telephony API outwards (but still require the data-oriented applications
to conform to that interface), Web-CTI interfaces provide an abstract
service-control layer that is separate from the telephony layer. Moreover,
the application interface looks and feels like any other application-oriented
interface, and does not require the connected applications to become
telephony devices. Executing a telephony task is the same as any other
kind of task—you can initiate a phone call just as easily as you
can issue a database lookup (or anything else that is available through
a parallel Web service), and you can do so without having to become a
peer device on the telephony network.
This concept is illustrated in Figure 1. In the Web-CTI model, a simple
WSDL interface exposes a variety of telephony and communication functions
to the application plane, thereby allowing the applications to make use
of whatever communication services are needed. Meanwhile, the back-end
systems do whatever is needed to make the task actually succeed, whether
that be placing a call through one of the available telecommunications
interfaces, or communicating with an IVR system through a serial line.
In other words, Web-CTI model provides an abstract interface for applications
to use which is entirely separate from the telephony and communications
infrastructure, which is a significant difference from the traditional
interfaces that simply extend the telephony infrastructure outwards without
any separation.
But while this separation makes integrating control functions into
data-oriented applications simpler, it’s also important to recognize
that it imposes a wall between the two worlds as well. In particular,
applications do not have access to the voice media through the Web-CTI
interface—while they can manipulate calls all day long, they cannot
actually participate in the call through this interface. If you need
to have your users and/or systems a voice call, you will still have to
bring them to the telephony network (whether that be SIP/RTP, a POTS
line, or whatever).
It’s also important to recognize that this area is in its early
stages, and the industry as a whole is a long way from seeing this kind
of promise in a widely-available, standardized form. In particular, different
kinds of functionality and implementation models are still being fleshed
out, and several more years of work will be needed before the industry
can coalesce around a set of practical and functionally-delineated standards.
On the one hand, different kinds of users are bringing different kinds
of demands to the table, and vendors are having to address those needs
differently. For example, some organizations will be most interested
in call-control features for customer-relationship applications, while
other organizations will want to integrate presence and instant messaging
functions into their corporate applications. Meanwhile, developers of
third-party products will likely want to have low-level access to telephony
functions as well as media services (such as are usually needed for third-party
voicemail and IVR systems), while desktop users may just want to integrate
their contact management application into the overall telephony service
or simply be able to manipulate basic features like presence and call-forwarding
through some kind of high-level interface.
On the other side of the coin, vendors and the standards bodies are
also pursuing their own functionality targets. For example, most of the
first-wave products from the IP PBX vendors are focusing on high-level
call-control functions that are abstracted from the underlying technology,
with the intention of expanding into related technologies over time.
Meanwhile, the current crop of standards are generally focused on their
particular segment of the telecom industry, and its only through serendipity
if those standards also provide the kind of functionality that’s
needed by broader markets.
All told, it is highly unlikely that a single monolithic standard will
emerge anytime soon that addresses all of the desired functions, especially
one that has a universally-applicable level of granularity. Instead,
multiple specifications are likely to evolve that each provide targeted
functionality, with consolidation around a handful of standards happening
three or four years from now. Simply put, things are going to get a lot
more complicated before they get simpler.
The Players
About a dozen IP PBX vendors are currently shipping Web service interfaces
to their systems, but most of those interfaces are aimed at administrative
tasks, such as managing the users and phones attached to those systems,
or configuring the PBX itself. We could only find three products that
are capable of performing rudimentary call-management tasks through a
Web services interface to their IP PBX today: Avaya’s Application
Enablement Services, Sphere Communications’ Sphericall, and Siemens’ HiPath
8000. All of the other vendors we spoke to were unable to meet our minimum
requirements, or were unwilling to discuss their implementation in detail.
For example, Cisco’s line of IP PBX systems does not yet have
the ability to manage calls through a general Web-CTI interface, although
they do make use of Web services for some configuration and administrative
tasks. It is theoretically possible to use some of these interfaces to
emulate a phone device in software and achieve some rudimentary integration,
but this is not documented and probably would not provide sufficient
functionality. Furthermore, Cisco representatives that we spoke to said
that their short-term strategy was to continue consolidating their various
acquisitions around common local interfaces, while relying on third-party
vendors like Metreos to provide additional development tools and services.
But while Metreos does indeed have a compelling CTI development platform,
they do not have a suitable Web-CTI interface as of yet. Meanwhile, BlueNote
Networks says that they are developing a Web-CTI interface to their IP
PBX line, but we were not allowed to examine the interfaces or its documentation.
Curiously, while vendors have been cautious with implementation, there
has been some significant standards development in this area (usually
it’s the other way around). In particular, the ECMA Web services
interface for CSTA and the Parlay Web services interface for OSA are
both relatively complete, and both of them have been available for a
couple of years now. Unfortunately, standards only matter when they are
implemented, and we could not find any IP PBX vendors that had fully
embraced these efforts.
The ECMA Standards
The most noteworthy of the existing Web-CTI standards are the collection
of specifications from the European Computer Manufacturing Association
(ECMA). The ECMA specifications are focused on call-management functions,
and are already widely implemented in private exchange gear typically
found on enterprise networks (this includes Avaya and Siemens IP PBXs,
which both use a subset of the ECMA collection).
At the heart of the relevant ECMA standards is CSTA (Computer-Supported
Telephony Applications) as defined by ECMA-269, which essentially describes
a generalized API for telephony applications to use when communicating
with other services and devices. ECMA-269 defines over 130 functions,
ranging from basic call-control tasks to operational features such as
putting a device into a Do-Not-Disturb condition, and also describes
ASN.1 encoding rules for those functions. However, these specifications
are heavily focused on call- and device-management, and are missing many
important non-telephony functions, such as presence and instant messaging.
The CSTA specification was subsequently supplemented by ECMA-323, which
defines XML encoding rules as an alternative to the ASN.1 encoding rules,
and also provides examples for use with different SOAP bindings. ECMA-323
was further supplemented by ECMA-348, which defines a standard WSDL definition
for the XML encoding, and provides examples for use with SOAP/HTTP.
We do not know of any vendors that implement ECMA-348.
Avaya and Siemens both implement portions of ECMA-269 and ECMA-323 (usually
implemented as XML-over-TCP), but that’s as close as we’ve
seen. Furthermore, many of the vendors we spoke to have expressed the
opinion that the ECMA standards are too low-level and complex for wide-scale
adoption outside the vendor community. Given the broad adoption of CSTA
however, we feel it is highly probable that these standards will continue
to be adopted in some form.
The Parlay Standards
The other significant set of standards are the Parlay collection of
specifications, as published by the vendor consortium of the same name.
The Parlay standards are frequently used to provide application interfaces
to carrier networks, and are therefore somewhat common in carrier-class
systems and the associated application-development platforms, but they
are not at all common on enterprise telephony gear as of this moment.
However, IT organizations that want to integrate public-network telephony
devices and services into their CTI applications as peers of local devices
will likely need to work with Parlay at some point. Furthermore, enterprise
application platform vendors like BEA, IBM and Oracle do support Parlay
in their “carrier” product lines, and its likely that one
or more of those tools will bring some of that functionality into the
corporate space, dragging the Parlay interfaces along.
The core Parlay specifications were developed by the Parlay
consortium in conjunction with the European Telecommunications Standards
Institute (ETSI) and the 3rd Generation Partnership Project (3GPP is
the oversight body for the 3G digital-cellular technology). These interfaces
form the API layer of the 3GPP Open Service Architecture (OSA), and are
generally referred to as the Parlay/OSA APIs. These APIs are intended
to be portable across multiple development environments, and the specification
describes their use with CORBA and JAIN, and also includes a WSDL definition.
Separately, there is also a subset specification called Parlay/X which
describes a lighter and higher-level set of APIs that is optimized for
use with Web service interfaces in particular. Whereas Parlay/OSA provides
asynchronous access to numerous low-level functions, Parlay/X provides
synchronous access to a much smaller number of functions.
However, the Parlay/X dictionary maps quite well to the kinds of functions
that corporate CTI developers might want to use, with high-level functions
for call-control, conferencing, presence, messaging, address book management,
and so forth (there are also functions that are more suitable for traditional
carrier networks too, such as functions to manage ring-tones and billing
information). This makes Parlay/X an interesting specification, even
if it is not widely used in corporate telephony environments yet.
One potential problem with the Parlay model is that it is heavily layered.
For example, in those cases where OSA provides the network-native application
interface (as is the case with 3G cellular networks), Parlay/OSA simply
exists as a programmable service, but in other cases the Parlay support
has to be provided by a gateway of some kind. Since Parlay/X represents
a subset of the Parlay/OSA APIs, it is also usually implemented as a
gateway also. This means that the full Parlay stack often requires two
gateways: one between Parlay/X and Parlay/OSA, and another between Parlay/OSA
and the native telephony network.
Worse though is that there is no real support for Parlay/X
in the IP PBX market—we do not know of any vendors who offer it
at the current time. However, if application vendors begin pushing into
this space, or if enterprise IT developers start clamoring to expand
their applications into cellular networks, there is some likelihood that
the market will adapt to those demands.
Sphericall Web Services
The most comprehensive Web-CTI interface available today is the Web
Services SDK for Sphere Communications’ Sphericall IP PBX offering.
The Web services component exposes the Sphericall IP PBX services through
a SOAP-compliant WSDL interface, and provides a lightweight, synchronous
messaging interface that is optimized for end-user development. It provides
functions for third-party call-control, conferencing, call-recording,
presence and status, instant messaging, number lookup, and call-history
lookups. Sphericall Web Services also provides some administrative and
event notification functions, which further rounds out the interface.
Multiple Sphericall PBX systems can be installed in a clustering environment,
and third-party devices can also be connected through the TAPI and SMDI
local interfaces.
The call-handling, conferencing, call-recording, and IM/presence
functions appear to be more than suitable for most purposes. Moreover,
Sphericall is the only Web services offering that has a sufficiently
comprehensive interface at this time, and none of the other implementations
that we looked at were as broadly usable.
Sphere has also implemented the most complete session management model,
with support for asynchronous bi-directional communications over the
SOAP channel. Sphere uses semi-permanent session identifiers to maintain
long-term state across transactions, coupled with a client-side “fetchEvents” function.
In this model, the client opens a connection, ask for any new events
that are associated with the session, and then enters a timeout condition
while waiting for event messages to arrive. If no notifications are received
within a specified interval, the client will eventually timeout, and
then reconnect with the server to restart the process. Cumulatively,
this provides for bi-directional asynchronous session-level event-messaging
over SOAP, which none of the other implementations offer.
There are a couple of other interesting features in the Spherical Web
services implementation worth noting. For one, presence and status information
can be set by through the Web services interface, meaning that you can
have your application change the user’s call-status automatically
(such as changing an operator’s status to reflect the fact that
they are talking to a customer whenever they release a call from an incoming
queue). Also, the Sphericall IP PBX has a feature called “forwarding
profiles” which allow for user-defined call-routing, and those
features are also partially exposed through the Web services interface.
Finally, Sphere also provides a simulation server which can be used for
off-line development and testing, which will probably prove to be extremely
useful for most in-house developers. Overall, the Spherical Web services
interfaces is pretty comprehensive, and is by far the most complete offering
available today.
Avaya Application Enablement Services
Avaya’s Web service interfaces are part of their Application
Enablement Services (AES) offering, which is an add-on gateway to Avaya’s
IP PBX products. Actually, Avaya has two different interfaces in AES
that are of interest to us: there is a high-level first-generation Web
services interface based on WSDL and SOAP, and there is also an XML-over-TCP
interface based on ECMA-323.
AES also has a handful of classic interfaces (including
JTAPI, TSAPI, and CSTA over ASN.1, among others), as well as their own
proprietary interfaces, all of which are implemented on the public side
of the gateway for applications to tap into. On the back side, the gateway
uses Avaya’s proprietary CLAN protocol to communicate with the
Avaya IP PBX systems, which in turn implements the local signaling protocol(s)
needed for the canonical telephony functions to work.
The WSDL/SOAP Web services interface has functions for managing user
accounts and settings, functions for managing the system and devices,
and functions for managing call-related activities. At the present time,
the range of telephony-related interfaces is pretty small, and is limited
to functions that can create a call, answer an incoming call, conferencing
and transfers, and session management tasks. There are no functions for
presence or instant messaging, voice-recording, call-history lookups,
or much else.
However, the XML-over-TCP interface, which Avaya refers to as the Communications
Manager API (CMAPI) XML SDK, is much more comprehensive. For example,
the current XML SDK contains 238 CSTA-specific XSD files and 52 Avaya-specific
XSD files, ranging from call-control and device-management features down
to ancillary features like call-recording and playback. As with the WSDL/SOAP
interface, there are no functions for presence or instant messaging in
the XML SDK as of yet.
Siemens HiPath 8000
Siemens’ HiPath 8000 is billed as a carrier-grade, software-based
IP PBX solution that is generally sold into very-large networks. Siemens
also has a line of add-on products, including the OpenScape presence
and collaboration platform, the Xpressions unified messaging system,
and the ProCenter call-center platform. Currently these layered products
use CSTA or SIP to talk to the IP PBX, but Siemens says that their strategic
plan is to eventually provide SOA interfaces that can support all of
these kinds of products directly.
The current HiPath 8000 v2 software release (which just started shipping
in April) is somewhat below that target objective, but is a good indication
of their strategic direction. At the moment, the HiPath 8000 provides
some basic administrative interfaces for device and user configuration
(these were also present in the v1 release), and also has some rudimentary
call management functions for call-setup and disconnect, call-history
lookups, and address book management tasks.
However, Siemens says that all of the HiPath 8000 internal functions
are already represented in XML, and they are continuing to productize
the low-level interfaces into high-level WSDL. In particular, the 2.1
release due out this summer is likely to have advanced call-control functions,
and may also include presence and messaging interfaces, although Siemens
would not commit to product details or a release schedule.
Overall, the HiPath 8000 architecture seems to be well designed for
scalability purposes, given that it already uses SIP and CSTA for most
of its internal functionality. If Siemens is able to productize the these
functions into a usable WSDL/SOAP interface, they will have a very strong,
standards-driven offering in this space.
Existing CTI Interfaces
There are well over a dozen computer-telephony APIs that are commonly
used in the industry already, although they are usually tied to specific
platforms or computing models. In broad terms, these interfaces can be
broken down into the following four basic categories: Local
APIs: These interfaces allow an application running on
a computer system to communicate with local telephony devices. Some of
the common interfaces in this space are Microsoft’s Telephony API
(TAPI) for Windows, the Novell/Lucent cross-platform Telephony Server
API (TSAPI), and the Java Telephony API (JTAPI). At the other end of
the local spectrum, CSTA is also widely used by telephony equipment providers
and third-party software developers, and generally falls into this category.
Another legacy interface in this category would be Simplified Message
Desk Interface (SMDI), which defines a serial-line protocol for phone
systems to use when communicating with a voicemail systems.
PSTN/Carrier APIs: These interfaces allow
applications to communicate with devices and services on modern public
networks, such as digital cellular and ISDN networks. Due to the limited
access to these networks, these interfaces are generally only used by
companies who provide services on those networks directly (IE, the telephone
companies themselves, or a third-party firm who provides things like
stock quotes and alerts to their customers), but otherwise are not commonly
used by most organizations. The two big interfaces here are Parlay/OSA
from the Parlay vendor consortium, and the Java APIs for Integrated Networking
(JAIN).
Standardized Network Protocols: Data-network
telephony protocols such as H.323 and SIP provide a variety of control
services as a necessary part of their functionality, and those control
services can be incorporated into data-centric applications if the application
developer is willing and able to implement the protocol stack into the
application directly. In this kind of scenario, the applications do not
use APIs to interface with telephony devices and services, but instead
act as direct peers to the devices and services on the voice network.
Vendor-Proprietary Interfaces: Apart from
the common interfaces described above, there are also some vendor-specific,
proprietary interfaces that are also sometimes supported in software
packages and development tools. Two examples here are the Cisco-specific
Skinny Call Connection Protocol (SCCP) that is widely used on Cisco VoIP
gear, and the Avaya-specific Communications Manager API (CMAPI) that
is used for connecting third-party gear to Avaya’s soft-switch.
Written by Eric
A. Hall.
Copyright © 2006 CMP Media. Used with permission. |