Specifications and Requirements
This topic provides general information related to prerequisites, as well as system capabilities and considerations.
Prerequisites
In general, these integration guidelines assume that:
A U‑Capture Core Services Kubernetes cluster has been deployed.
One or more SIPREC Collectors have been deployed.
Each Collector has been provisioned with “basic configuration” to connect the Collector with the U‑Capture Core Services.
Call Recording Capabilities
Call Metadata
Standard Metadata
Metadata Extension Modules
Custom (Configuration Based)
Sonus SBC
Integration Dependent Features
Codec Support
A wide range of codecs are supported. See SIP RAM Configuration for details on setting codec priority.
Security
To ensure security of data on-the-wire, Uniphore supports Secure SIP and Secure RTP (SRTP). SRTP keys are negotiated in the SDP using SDES (mechanism of negotiating security keys). We recommend using Secure SIP in conjunction with SRTP to keep the SRTP keys from being compromised.
Secure SIP
To provide a secure SIP interface, TLS and DTLS are supported as SIP transports. See SIP RAM Configuration for more information on how to set the enabled SIP transports. The following protocol versions are supported. Previous insecure protocol versions are not supported.
TLS 1.3
TLS 1.2
DTLS 1.2
SIPS URIs are supported, but not required with either TLS or DTLS transports.
The cipher suites (listed below) are supported in order of preference. On first start with either TLS or DTLS transports enabled, the Diffie-Hellman (DH) parameters are automatically generated. This only happens once on first start but takes several minutes to complete. The OpenSSL command line window must not be closed during this operation.
TLS_AES_256_GCM_SHA384
TLS_CHACHA20_POLY1305_SHA256
TLS_AES_128_GCM_SHA256
TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
TLS_DHE_RSA_WITH_AES_256_GCM_SHA384
TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
TLS_DHE_RSA_WITH_CHACHA20_POLY1305_SHA256
TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
TLS_DHE_RSA_WITH_AES_128_GCM_SHA256
TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384
TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384
TLS_DHE_RSA_WITH_AES_256_CBC_SHA256
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256
TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
TLS_DHE_RSA_WITH_AES_128_CBC_SHA256
TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
TLS_DHE_RSA_WITH_AES_256_CBC_SHA
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA
TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
TLS_DHE_RSA_WITH_AES_128_CBC_SHA
TLS_RSA_WITH_AES_256_GCM_SHA384
TLS_RSA_WITH_AES_128_GCM_SHA256
TLS_RSA_WITH_AES_256_CBC_SHA256
TLS_RSA_WITH_AES_128_CBC_SHA256
TLS_RSA_WITH_AES_256_CBC_SHA
TLS_RSA_WITH_AES_128_CBC_SHA
Secure RTP
Secure RTP operates in three modes:
Best Effort (default): If an SDP offer with a supported crypto attribute is received, Uniphore will respond with a crypto attribute in its SDP answer and SRTP will be used for the media session. The SDP media protocol may be either RTP/AVP or RTP/SAVP.
Mandatory: If an SDP offer with media protocol RTP/SAVP and a supported crypto attribute is received, Uniphore will respond with RTP/SAVP and a crypto attribute and SRTP will be used for the media session. Otherwise, the session will be rejected with a SIP 488 response.
Disabled: If an SDP offer with media protocol RTP/AVP is received, Uniphore will respond with RTP/AVP and RTP will be used for the media session. Otherwise, the session will be rejected with a SIP 488 response. See SIP RAM Configuration for more information on how to set the SRTP mode.
The following cipher suites are supported in order of preference.
AEAD_AES_256_GCM (RFC 7714)
AEAD_AES_128_GCM (RFC 7714)
AES_256_CM_HMAC_SHA1_80 (RFC 6188)
AES_256_CM_HMAC_SHA1_32 (RFC 6188)
AES_192_CM_HMAC_SHA1_80 (RFC 6188)
AES_192_CM_HMAC_SHA1_32 (RFC 6188)
AES_CM_128_HMAC_SHA1_80 (RFC 3711)
AES_CM_128_HMAC_SHA1_32 (RFC 3711)
SRTP data is stored in its encrypted form. This is for performance reasons as cryptographic operations i.e. decryption, are slow. The SRTP keys are stored alongside the SRTP data.
Session Parameters
The following SDES session parameters from RFC 4568 are supported. The far end may use them to override the session defaults. These overrides will be honoured by the Collector.
UNENCRYPTED_SRTP: SRTP will not be encrypted. This is useful where regulations do not permit encryption, but data integrity is still desired.
UNAUTHENTICATED_SRTP: SRTP will not be authenticated. This is not recommended.
UNENCRYPTED_SRTCP: SRTCP will be authenticated, but not encrypted. This is useful where regulations do not permit encryption, but data integrity is still desired.
Secure RTCP
If SRTP is in use, you can select whether to expect RTCP to be secure (SRTCP) as well (see [SRTCP]UseSRTCPWhenSRTP
in RTP Collector Configuration). Receiver Reports transmitted will be secured using SRTCP and the negotiated keys.
3rd Party SIP Requirements
Initiating a Recording Session
This integration uses the SIP RAM’s SIPREC interpreter to ensure that the SIPREC metadata and call audio is correctly captured by the Collector. In order to use the interpreter, the SIP INVITE message sent from the SIP UAC (SIP User Agent Client) must meet the following criteria (as per RFC 7866 section 6.1.1) to establish a SIPREC recording session:
The “Require” header must contain “siprec”.
The “Contact” header URI must contain “+sip.rec”.
If this criteria is not met, the SIP INVITE message will be processed as though it is a basic SIP session.
For SIPREC sessions, call recordings are not started based on the receipt of a SIP INVITE message. This is achieved using the recording metadata, see Recording Metadata below.
Recording Metadata
Recording metadata must conform to the schema defined in RFC 7865. The SRS accepts capturing metadata bodies with either of the following content types contained in the SIP INVITE (multipart), UPDATE, and BYE messages:
“application/rs-metadata” (RFC 7866).
“application/rs-metadata+xml” (RFC 7865).
Note
A re-INVITE message may also contain recording metadata if a content type is specified.
Capturing metadata bodies may be either complete or partial updates. Provided the initial metadata body is complete, subsequent metadata bodies may be partial. Complete metadata bodies will replace the copy of the metadata held by the SRS. Partial metadata bodies will update the copy of the metadata held by the SRS.
To be viable to record, the metadata must contain at least one participant. Each participant must have at least one participant-session association i.e a person entering a call.
Ending a Recording Session
Receipt of a SIP BYE message implies that all call recordings started through recording metadata should be ended.
The following is an example of a complete recording metadata body.
<?xml version="1.0" encoding="UTF-8"?> <recording xmlns='urn:ietf:params:xml:ns:recording:1'> <datamode>complete</datamode> <group group_id="7+OTCyoxTmqmqyA/1weDAg=="> <associate-time>2010-12-16T23:41:07Z</associate-time> </group> <session session_id="hVpd7YQgRW2nD22h7q60JQ=="> <group-ref>7+OTCyoxTmqmqyA/1weDAg==</group-ref> </session> <participant participant_id="srfBElmCRp2QB23b7Mpk0w=="> <nameID aor="sip:bob@biloxi.com"> <name xml:lang="en">Bob</name> </nameID> </participant> <participant participant_id="zSfPoSvdSDCmU3A3TRDxAw=="> <nameID aor="sip:Paul@biloxi.com"> <name xml:lang="en">Paul</name> </nameID> </participant> <stream stream_id="UAAMm5GRQKSCMVvLyl4rFw==" session_id="hVpd7YQgRW2nD22h7q60JQ=="> <label>1</label> </stream> <stream stream_id="i1Pz3to5hGk8fuXl+PbwCw==" session_id="hVpd7YQgRW2nD22h7q60JQ=="> <label>2</label> </stream> <sessionrecordingassoc session_id="hVpd7YQgRW2nD22h7q60JQ=="> <associate-time>2010-12-16T23:41:07Z</associate-time> </sessionrecordingassoc> <participantsessionassoc participant_id="srfBElmCRp2QB23b7Mpk0w==" session_id="hVpd7YQgRW2nD22h7q60JQ=="> <associate-time>2010-12-16T23:41:07Z</associate-time> </participantsessionassoc> <participantsessionassoc participant_id="zSfPoSvdSDCmU3A3TRDxAw==" session_id="hVpd7YQgRW2nD22h7q60JQ=="> <associate-time>2010-12-16T23:41:07Z</associate-time> </participantsessionassoc> <participantstreamassoc participant_id="srfBElmCRp2QB23b7Mpk0w=="> <send>i1Pz3to5hGk8fuXl+PbwCw==</send> <recv>UAAMm5GRQKSCMVvLyl4rFw==</recv> </participantstreamassoc> <participantstreamassoc participant_id="zSfPoSvdSDCmU3A3TRDxAw=="> <send>UAAMm5GRQKSCMVvLyl4rFw==</send> <recv>i1Pz3to5hGk8fuXl+PbwCw==</recv> </participantstreamassoc> </recording>
The following is an example of a partial recording metadata body.
<?xml version="1.0" encoding="UTF-8"?> <recording xmlns='urn:ietf:params:xml:ns:recording:1'> <datamode>partial</datamode> <participant participant_id="srfBElmCRp2QB23b7Mpk0w=="> <nameID aor="sip:bob@biloxi.com"> <name xml:lang="en">Bob</name> </nameID> </participant> <participantsessionassoc participant_id="srfBElmCRp2QB23b7Mpk0w==" session_id="hVpd7YQgRW2nD22h7q60JQ=="> <disassociate-time>2010-12-16T23:41:07Z</disassociate-time> </participantsessionassoc> </recording>
Call Recordings
Unless an extension specifies otherwise, the SRS will treat the first participant in the metadata as the participant to be captured. Call recordings begin and end based on participant-session associations this metadata specified in <participantsessionassoc>
.
Beginning a Call Recording
A call recording will begin when a participant-session association (<participantsessionassoc>
) for a recorded participant does not contain a disassociate time (see Ending a Call Recording) and there is not already a call recording in progress for that participant-session association.
Ending a Call Recording
A call recording will end when a participant-session association contains a disassociate time – this is specified in <disassociate-time>
, and there is a call recording in progress for that participant-session association.
Registration
Registration is a mechanism of associating a user’s Address of Record (AoR) (User URI/ID) with one or more devices (e.g. phone, mobile phone). A few things to consider are:
Some integrations may require the U‑Capture Collector to register with a SIP registrar. For example, the Collector may register as a user that can be conferenced into a call as a silent monitor, thus establishing a call recording session.
In situations where user authentication is required, digest authentication is supported.
If the route from the Collector to the registrar can traverse one or more SIP proxies, SIP outbound (RFC 5626) is supported.
See the SIP RAM Configuration for information on how to configure registration.
Networking Considerations
Sufficient network bandwidth between the SRC (Session Recording Client) and SRS (Session Recording Server) is required to support the necessary concurrent call recordings. This can be calculated using the following formula:
bandwidth = bitrate x channels
Where bitrate is the codec bitrate and channels is the number of concurrent channels to be recorded (times one for mono, or two for stereo).
Due to the small size and high frequency of RTP packets, it’s not possible to utilise the full potential bandwidth before the network becomes a bottleneck and packet loss ensues.
A dedicated network adapter should be used for media traffic.
A network adapter should be added for each 100 Mbps of required network bandwidth.
Note
This assumes a 1000 BASE-T network.
See RTP Collector Configuration for details on specifying which network adapters to utilize for media traffic.
Where the SRS and SRC reside on separate network subnets, the SRC must be able to route to the SRS and the SRS must also be able to route to the SRC.
Firewall Ports
Note
These are default ports – you can change these as detailed in Configuration. Also note that the ports detailed above are specifically for this integration.
Voice Activity Detection
To avoid recording silence (and therefore reduce storage requirements), the Collector can perform Voice Activity Detection (VAD) on received audio. VAD can use a combination of different signal analysis algorithms to identify voice. VAD is performed on each channel by analysing a 10-millisecond window. Each algorithm is performed on the audio within the window and a decision on whether that window contains voice or not is based on the levels falling within a set of configured thresholds.
Using the default configuration, it will take as little as 20-milliseconds of voice activity on a channel to trigger recording. Once voice activity is no longer detected, recording will continue for 5 seconds before ending. This prevents pauses in speech from causing excessive stopping and restarting of recording.
Enabling VAD does impact Collector performance and will reduce the maximum possible concurrent recordings. VAD is only supported with calls using the G.711 audio codec and cannot be performed on calls using SRTP.
The signal analysis algorithms currently used by VAD are:
Short-Term Energy: STE is essentially a method of determining how loud a window of audio is. Windows of voice will have higher energy. Windows of non-voice will have lower energy.
Zero Crossing Rate: ZCR represents how often within a window of audio, the signal’s amplitude alternates between positive and negative. Windows of voice will have a lower ZCR. Windows of non-voice will have a higher ZCR.
ZCR is useful for distinguishing between tones and voice. Tones are often as loud as voice, but in general will be significantly higher frequency.
See [VoiceActivityDetection]
in Integration Adapter Configuration, for more information on configuring VAD.
Tested Configuration & Compatibility
Integration testing of a basic interop between SIPREC and U‑Capture (SIPREC Collector and U‑Capture Core Services) was completed using SIP phones and Sonus SBC.
There are no known compatibility limitations for this integration with other recording types.