Skype for Business Mediation Event IDs 25036, 25051, 25052 & 25061

Environment

  • Skype for Business Server 2015, approx. 4500 Enterprise Voice users.
  • Enterprise Edition Front End Pool (3 x Front End Servers).
  • Mediation Pool (2 x Mediation Servers).
  • Active/Standby SIP Trunks terminating on AudioCodes Mediant 2600 E-SBCs.

Issue

A Skype for Business integrated Contact Centre solution was deployed to support 120 agents.  Switchover from the legacy system to the new Contact Centre solution happened early one morning, and all was going good until late morning when the Contact Centre hit one of its peak daily times.

As PSTN call volume increased (mainly inbound) to approx. 100 concurrent calls, calls started to drop.  Mediation marked the primary trunk as down, and flipped to the standby trunk.

The following errors were logged on the Mediation Servers.

Log Name: Lync Server
Source: LS Mediation Server
Date: 14/10/2017 14:02:46
Event ID: 25036
Task Category: (1030)
Level: Error
Keywords: Classic
User: N/A
Computer: SFBMED1.x500.co.uk

Description:
Mediation Server cannot reach the Trunk. Additional failures will not be logged.
Cannot reach Trunk: SBC1.x500.co.uk;trunk=SBC1.x500.co.uk

Cause:
Trunk IP address is invalid or Trunk is not functioning correctly.

Resolution:
Check the Trunk address and Trunk status.

Log Name: Lync Server
Source: LS Mediation Server
Date: 14/10/2017 14:03:29
Event ID: 25051
Task Category: (1030)
Level: Error
Keywords: Classic
User: N/A
Computer: SFBMED1.x500.co.uk

Description:
There was no response from a Trunk to an OPTIONS request sent by the Mediation Server.  The Trunk, SBC1.x500.co.uk;trunk=SBC1.x500.co.uk, is not responding to an OPTIONS request sent by the Mediation Server service.
DNS Resolution Failure: False

Exception: Microsoft.Rtc.Signaling.ConnectionFailureException:Unable to establish a connection. —> System.Net.Sockets.SocketException (0x80004005): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond at System.Net.Sockets.Socket.EndReceive(IAsyncResult asyncResult) at Microsoft.Rtc.Internal.Sip.TcpTransport.OnReceived(Object arg)

Cause:
The Mediation Server service cannot communicate with the Trunk Service over SIP due to network connectivity issues.

Resolution:
Please ensure network connectivity and availability of the Trunk for the Mediation Server service to be able to function correctly.

Log Name: Lync Server
Source: LS Mediation Server
Date: 14/10/2017 14:03:44
Event ID: 25051
Task Category: (1030)
Level: Error
Keywords: Classic
User: N/A
Computer: SFBMED1.x500.co.uk

Description:
There was no response from a Trunk to an OPTIONS request sent by the Mediation Server.  The Trunk, SBC1.x500.co.uk;trunk=SBC1.x500.co.uk, is not responding to an OPTIONS request sent by the Mediation Server service.
DNS Resolution Failure: False

Exception:
Microsoft.Rtc.Signaling.ConnectionFailureException:Operation failed because the network connection was not available. —> System.Net.Sockets.SocketException (0x80004005): An attempt was made to access a socket in a way forbidden by its access permissions 192.168.153.20:5068 at System.Net.Sockets.Socket.EndConnect(IAsyncResult asyncResult)
at Microsoft.Rtc.Internal.Sip.TcpTransport.OnConnected(Object arg).

Cause:
The Mediation Server service cannot communicate with the Trunk Service over SIP due to network connectivity issues.

Resolution:
Please ensure network connectivity and availability of the Trunk for the Mediation Server service to be able to function correctly.

The above error was repeated at 14:04:45, 14:05:46, and 14:06:48.

Log Name: Lync Server
Source: LS Mediation Server
Date: 14/10/2017 14:06:48
Event ID: 25052
Task Category: (1030)
Level: Error
Keywords: Classic
User: N/A
Computer: SFBMED1.x500.co.uk

Description:
The Trunk peer cannot be contacted. Mediation server will keep trying; however additional failures will not be logged.  The Trunk peer, SBC1.x500.co.uk;trunk=SBC1.x500.co.uk, is not responding to OPTIONS requests sent by the Mediation Server service.
DNS Resolution Failure: False

Exception:
Microsoft.Rtc.Signaling.ConnectionFailureException:Operation failed because the network connection was not available. —> System.Net.Sockets.SocketException (0x80004005): An attempt was made to access a socket in a way forbidden by its access permissions 192.168.153.20:5068 
at System.Net.Sockets.Socket.EndConnect(IAsyncResult asyncResult) at Microsoft.Rtc.Internal.Sip.TcpTransport.OnConnected(Object arg)

Log Name: Lync Server
Source: LS Mediation Server
Date: 14/10/2017 14:06:48
Event ID: 25061
Task Category: (1030)
Level: Error
Keywords: Classic
User: N/A
Computer: SFBMED1.x500.co.uk

Description:
The Mediation Server service has encountered a major connectivity problem with these Trunk peer(s).
Affected PSTN Trunk Service Cluster(s):
SBC1.x500.co.uk;trunk=SBC1.x500.co.uk

Cause:
MEDIATIONSERVER_GATEWAY_OPTIONS_FAILED (Event ID: 25051) was recorded 5 times.  Check other MOM alerts for more details.  The MEDIATIONSERVER_GATEWAY_IP_NOT_AVAILABLE (Event ID: 25036), MEDIATIONSERVER_GATEWAY_TLS_NEGOTIATION_FAILED (Event ID: 25040) are examples of events that signal connectivity error conditions with the Trunk peer.

Resolution:
If the failure is MEDIATIONSERVER_GATEWAY_IP_NOT_AVAILABLE (Event ID: 25036), make sure that the correct listening IP and port for the Trunk have been configured in the PSTN Trunk object in management store and that the Trunk is up and running and able to accept incoming connections from the Mediation Server.  If the failure is MEDIATIONSERVER_GATEWAY_TLS_NEGOTIATION_FAILED (Event ID: 25040), make sure that both the Mediation Server and the Trunk are configured for TLS and that the CA for the Trunk’s certificate is the trusted certificate path on the Mediation Server and the CA for the Mediation Server’s certificate is in the trusted certificate path on the Trunk.

After approx. 6 minutes, the trunk recovered without any admin intervention.

Log Name: Lync Server
Source: LS Mediation Server
Date: 14/10/2017 14:12:55
Event ID: 25062
Task Category: (1030)
Level: Information
Keywords: Classic
User: N/A
Computer: SFBMED1.x500.co.uk

Description:
The Mediation Server service Trunk peer connectivity failure has been resolved.
Most recent PSTN Trunk Service Cluster to recover:
SBC1.x500.co.uk;trunk=SBC1.x500.co.uk

Log Name: Lync Server
Source: LS Mediation Server
Date: 14/10/2017 14:13:13
Event ID: 25038
Task Category: (1030)
Level: Information
Keywords: Classic
User: N/A
Computer: SFBMED1.x500.co.uk

Description:
Mediation Server is able to reach the Trunk SBC1.x500.co.uk;trunk=SBC1.x500.co.uk

Investigation

Everything was checked, including SBCs, SIP Trunks, SIP Channel Usage, network connectivity etc. and no problems were found.

I created some load tools to generate PSTN calls using the Twilio infrastructure, and then ran the tools against the environment outside of operational hours.  Exactly the same issue occurred when the simultaneous PSTN call load got to approx. 100 calls.

Under normal conditions, you can ping the SBC from the Skype for Business Mediation Servers.

Pinging SBC1.x500.co.uk [192.168.153.20] with 32 bytes of data:

Reply from 192.168.153.20: bytes=32 time=1ms TTL=61
Reply from 192.168.153.20: bytes=32 time=1ms TTL=61
Reply from 192.168.153.20: bytes=32 time=1ms TTL=61
Reply from 192.168.153.20: bytes=32 time=1ms TTL=61

Ping statistics for 192.168.153.20:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:
    Minimum = 1ms, Maximum = 1ms, Average = 1ms

From the Mediant 2600, you can ping the Skype for Business Mediation Servers.

Mediant 2600# ping 192.168.190.10 repeat 5
PING 192.168.190.10 (192.168.190.10) 56(84) bytes of data.
64 bytes from 192.168.190.10: icmp_seq=1 ttl=255 time=0.121 ms
64 bytes from 192.168.190.10: icmp_seq=2 ttl=255 time=0.101 ms
64 bytes from 192.168.190.10: icmp_seq=3 ttl=255 time=0.113 ms
64 bytes from 192.168.190.10: icmp_seq=4 ttl=255 time=0.107 ms
64 bytes from 192.168.190.10: icmp_seq=5 ttl=255 time=0.121 ms
—192.168.190.10 ping statistics —
5 packets transmitted, 5 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 0.101/0.112/0.121/0.014 ms

During the testing I noticed that when the issue occurs, you can no longer ping the SBC from the Mediation Servers and vice-versa (General failure – it doesn’t even get on the wire).  Interestingly, from a different server on the same subnet (and running on the same hypervisor), you can.

Pinging SBC1.x500.co.uk [192.168.153.20with 32 bytes of data:

General failure.
General failure.
General failure.
General failure.

Ping statistics for 192.168.153.20:
    Packets: Sent = 4, Received = 0, Lost = 4 (100% loss)

This clearly points at an issue with the Mediation Servers.

It turned out to be McAfee HIPS (Host Intrusion Prevention), identifying the increased UDP port usage as an attack [Attack type: Port Scan (UDP)].

HIPS1

The host (Primary SBC) is blocked for 10 minutes, explaining why the primary trunk recovered without any admin intervention.

HIPS3

Note once calls failed over to the standby trunk, again when max concurrent calls hit approx. 100, the same happens against the standby SBC/trunk.

Resolution

I disabled McAfee HIPS on the Mediation Servers, alternatively you could create exclusions for the traffic/hosts.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s