Elastix: Outbound calling issue



  • We are looking at a Elastix system that has had out bound calling issues twice this week. We have 'resolved' the issue by force restarting the PBX system. But this is not a solution mid day.

    IO wait times have been below 1, with a 1.4 average as best as I see.

    07:30:02 AM       CPU     %user     %nice   %system   %iowait    %steal     %idle
    07:40:01 AM       all      0.03      0.00      0.09      0.23      0.00     99.65
    07:50:01 AM       all      0.03      0.00      0.10      0.41      0.00     99.46
    08:00:01 AM       all      0.03      0.00      0.09      0.27      0.00     99.60
    08:10:01 AM       all      0.03      0.00      0.09      0.33      0.00     99.54
    08:20:01 AM       all      0.03      0.00      0.10      0.29      0.00     99.58
    08:30:01 AM       all      0.06      0.00      0.09      0.26      0.00     99.60
    08:40:01 AM       all      0.03      0.00      0.09      0.30      0.00     99.58
    08:50:01 AM       all      0.03      0.00      0.10      0.21      0.00     99.66
    09:00:01 AM       all      0.03      0.00      0.09      0.30      0.00     99.57
    09:10:01 AM       all      0.04      0.00      0.09      0.31      0.00     99.56
    09:20:01 AM       all      0.03      0.00      0.11      0.28      0.00     99.58
    09:30:01 AM       all      0.13      0.00      0.17      0.75      0.00     98.95
    09:40:01 AM       all      0.88      0.00      0.15      0.40      0.00     98.57
    09:50:01 AM       all      1.40      0.00      0.19      0.64      0.00     97.78
    10:00:01 AM       all      3.17      0.00      0.44      1.41      0.00     94.98
    10:10:01 AM       all      3.39      0.00      0.69      0.87      0.00     95.06
    10:20:01 AM       all      2.98      0.00      0.60      0.74      0.00     95.68
    10:30:01 AM       all      2.77      0.00      0.32      0.56      0.00     96.35
    Average:          all      0.27      0.00      0.13      1.03      0.00     98.57
    
    free -m
                 total       used       free     shared    buffers     cached
    Mem:          1001        615        386          0          7         68
    -/+ buffers/cache:        539        462
    Swap:         2015          0       2015
    
    


  • Current GUI panel report

    pbxgui.png

    It does not seem that the system is using any of the swap memory...



  • What else should I be looking for to try to isolate this issue? There doesn't seem to be a problem with the SIP provider, or the PBX itself. But I could easily miss something.



  • We've looked at SAR reports previously, CPU and memory are definitely not the issue.



  • Next step is scouring the system logs around the time of the loss of connectivity.



  • One thing that may need to be looked at is the Host. This PBX is running on a VMware host. Could the Host some how be causing the issue and not the PBX itself?



  • @g.jacobse said:

    One thing that may need to be looked at is the Host. This PBX is running on a VMware host. Could the Host some how be causing the issue and not the PBX itself?

    Not realistically.



  • Outbound calling via SIP or PRI?



  • @scottalanmiller said:

    @g.jacobse said:

    One thing that may need to be looked at is the Host. This PBX is running on a VMware host. Could the Host some how be causing the issue and not the PBX itself?

    Not realistically.

    Why not? Let's assume he only has one NIC in the VM host - if it gets saturated by another VM. So before you can dismiss it, don't you have to ask about the host config?



  • @coliver said:

    Outbound calling via SIP or PRI?

    SIP via Vitality.

    The Vitality logs have been investigated.



  • @Dashrender said:

    Why not? Let's assume he only has one NIC in the VM host - if it gets saturated by another VM. So before you can dismiss it, don't you have to ask about the host config?

    Because that would not impact solely outbound call setup. If that were happening we would lose inbound calling, existing calls, remote access, etc. Call setup uses trivial network resources, if we were approaching that level everything in the office would be broken AND rebooting could not fix it.



  • @scottalanmiller said:

    @Dashrender said:

    Why not? Let's assume he only has one NIC in the VM host - if it gets saturated by another VM. So before you can dismiss it, don't you have to ask about the host config?

    Because that would not impact solely outbound call setup. If that were happening we would lose inbound calling, existing calls, remote access, etc. Call setup uses trivial network resources, if we were approaching that level everything in the office would be broken AND rebooting could not fix it.

    Good point!



  • Memory seems to be creeping up again.

    06:20:01 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
    06:30:01 PM       all      1.51      0.00      0.19      0.39      0.00     97.91
    06:40:01 PM       all      1.44      0.00      0.18      0.75      0.00     97.62
    06:50:01 PM       all      1.42      0.00      0.19      0.65      0.00     97.74
    07:00:01 PM       all      1.39      0.00      0.18      1.00      0.00     97.43
    07:10:01 PM       all      1.55      0.00      0.19      0.43      0.00     97.84
    07:20:01 PM       all      1.42      0.00      0.19      0.71      0.00     97.68
    07:30:01 PM       all      1.46      0.00      0.18      0.70      0.00     97.66
    07:40:01 PM       all      1.42      0.00      0.18      0.51      0.00     97.89
    07:50:01 PM       all      1.44      0.00      0.18      0.62      0.00     97.75
    08:00:01 PM       all      1.43      0.00      0.17      0.59      0.00     97.82
    08:10:01 PM       all      1.42      0.00      0.18      0.61      0.00     97.79
    Average:          all      0.91      0.00      0.20      0.84      0.00     98.05
    
    $ free -m
                 total       used       free     shared    buffers     cached
    Mem:          1001        607        394          0         63         90
    -/+ buffers/cache:        453        548
    Swap:         2015          0       2015
    
    $ /etc/init.d/httpd restart
    Stopping httpd:                                            [  OK  ]
    Starting httpd: httpd: Could not reliably determine the server's fully qualified domain name, using pbx.[xxx]
    .com for ServerName
                                                               [  OK  ]
    
    $  free -m
                 total       used       free     shared    buffers     cached
    Mem:          1001        336        665          0         63         90
    -/+ buffers/cache:        182        819
    Swap:         2015          0       2015
    [root@pbx ~]#
    

    The command /etc/init.d/httpd restart was suggested by @scottalanmiller .



  • mbkpbx3.png
    The memory drop at about 14:00 was the first time I ran it,

    And again just at 20:20



  • Any new issues since the Apache restart?



  • @scottalanmiller said:

    Any new issues since the Apache restart?

    They are no longer in the office today. I am not aware of any new issues or complaints. There is other host / server maintenance needed at this site this weekend, so I expect at some point between now and late Sunday both servers will be restarted.

    I will continue to monitor the system, but otherwise will not restart Apache again.



  • Elastix 2.4 or 2.5 by the way? Just curious.



  • @JaredBusch said:

    Elastix 2.4 or 2.5 by the way? Just curious.

    2.5 (sad trombone plays)



  • I'm not sure you ever actually specified what you mean by outbound calling issues. Do you get a busy signal, all circuits busy now, some other error, etc.?

    Check the trunk settings, and make sure you have disallow=all in there and allow=ulaw under it (need to check Vitelity documentation on this). If I remember right, Vitelity requires separate trunk setups (at least in the Elastix sense of the word trunk) for inbound and outbound calling. Hopefully you do not have allow=ulaw&g729 because I have seen instances where Elastix (at least in 2.4) will actually try to use g729 outbound, and you get one-way audio.

    You could do a tcpdump on the Elastix box, make some calls when this situation happens, download the tcpdump to your machine, and open with Wireshark to see a visual flow of traffic. That may give you some insight.



  • @NetworkNerd said:

    I'm not sure you ever actually specified what you mean by outbound calling issues. Do you get a busy signal, all circuits busy now, some other error, etc.?

    It was stated earlier that the trunk works inbound and outbound and then just suddenly stops making outbound calls. Inbound continue working.



  • @JaredBusch said:

    @NetworkNerd said:

    I'm not sure you ever actually specified what you mean by outbound calling issues. Do you get a busy signal, all circuits busy now, some other error, etc.?

    It was stated earlier that the trunk works inbound and outbound and then just suddenly stops making outbound calls. Inbound continue working.

    yes, the system operates normally 90% of the time,.. then suddenly twice in a week the customer was unable to make outbound calls. They didn't say the exact error, but my guess is that they get all circuits are busy or a fast busy.

    I have not checked it yet, I was informed that with some updates from MS,.. all the servers were restarted yesterday.



  • @g.jacobse Why would updates from MS trigger updates? Are they running on HyperV?



  • @scottalanmiller said:

    @g.jacobse Why would updates from MS trigger updates? Are they running on HyperV?

    In this case, the DC reported there were updates available and a reboot was needed. so, while I have not verified it, all boxes were touched to ensure being up to date and restarted as needed.



  • current GUI

    mbkpbx4.png



  • Sar:

     sar
    Linux 2.6.18-371.1.2.el5 (pbx.XXX.com)     08/16/2015
    
    12:00:01 AM       CPU     %user     %nice   %system   %iowait    %steal     %idle
    12:10:01 AM       all      0.05      0.00      0.09      1.81      0.00     98.06
    12:20:01 AM       all      0.04      0.00      0.10      1.12      0.00     98.74
    12:30:01 AM       all      0.07      0.00      0.09      0.95      0.00     98.89
    12:40:01 AM       all      0.05      0.00      0.09      1.17      0.00     98.69
    12:50:01 AM       all      0.04      0.00      0.10      1.64      0.00     98.22
    01:00:01 AM       all      0.05      0.00      0.09      1.81      0.00     98.05
    01:10:01 AM       all      0.04      0.00      0.09      1.53      0.00     98.34
    01:20:01 AM       all      0.04      0.00      0.10      1.05      0.00     98.81
    01:30:01 AM       all      0.07      0.00      0.10      1.37      0.00     98.46
    01:40:01 AM       all      0.04      0.00      0.09      1.57      0.00     98.30
    01:50:01 AM       all      0.04      0.00      0.10      1.35      0.00     98.51
    02:00:01 AM       all      0.04      0.00      0.09      0.96      0.00     98.91
    02:10:01 AM       all      0.04      0.00      0.09      1.65      0.00     98.21
    02:20:01 AM       all      0.04      0.00      0.10      1.35      0.00     98.51
    02:30:01 AM       all      0.07      0.00      0.09      1.55      0.00     98.29
    02:40:01 AM       all      0.04      0.00      0.08      1.18      0.00     98.70
    02:50:01 AM       all      0.04      0.00      0.10      1.18      0.00     98.68
    03:00:01 AM       all      0.04      0.00      0.08      1.17      0.00     98.71
    03:10:01 AM       all      0.05      0.00      0.09      1.91      0.00     97.95
    03:20:01 AM       all      0.04      0.00      0.11      0.91      0.00     98.94
    03:30:01 AM       all      0.07      0.00      0.10      1.16      0.00     98.67
    03:40:01 AM       all      0.04      0.00      0.09      1.10      0.00     98.77
    
    03:40:01 AM       CPU     %user     %nice   %system   %iowait    %steal     %idle
    03:50:01 AM       all      0.04      0.00      0.10      1.33      0.00     98.54
    04:00:01 AM       all      0.04      0.00      0.09      1.23      0.00     98.64
    04:10:01 AM       all      0.28      0.00      0.18      1.78      0.00     97.76
    04:20:01 AM       all      0.04      0.00      0.10      1.67      0.00     98.18
    04:30:01 AM       all      0.07      0.00      0.10      1.33      0.00     98.50
    04:40:01 AM       all      0.04      0.00      0.09      1.31      0.00     98.57
    04:50:01 AM       all      0.04      0.00      0.10      1.38      0.00     98.48
    05:00:01 AM       all      0.04      0.00      0.09      1.34      0.00     98.53
    05:10:01 AM       all      0.04      0.00      0.09      1.62      0.00     98.24
    05:20:01 AM       all      0.04      0.00      0.11      1.10      0.00     98.75
    05:30:01 AM       all      0.06      0.00      0.10      1.22      0.00     98.62
    05:40:01 AM       all      0.04      0.00      0.09      1.43      0.00     98.44
    05:50:01 AM       all      0.04      0.00      0.10      1.68      0.00     98.17
    06:00:01 AM       all      0.04      0.00      0.09      1.34      0.00     98.53
    06:10:01 AM       all      0.05      0.00      0.09      1.11      0.00     98.75
    06:20:01 AM       all      0.04      0.00      0.10      0.72      0.00     99.14
    06:30:01 AM       all      0.07      0.00      0.10      1.04      0.00     98.79
    06:40:01 AM       all      0.04      0.00      0.09      1.58      0.00     98.30
    06:50:01 AM       all      0.04      0.00      0.10      1.28      0.00     98.58
    07:00:01 AM       all      0.03      0.00      0.07      0.75      0.00     99.14
    07:10:01 AM       all      0.04      0.00      0.07      0.38      0.00     99.51
    07:20:01 AM       all      0.04      0.00      0.08      0.29      0.00     99.60
    
    07:20:01 AM       CPU     %user     %nice   %system   %iowait    %steal     %idle
    07:30:01 AM       all      0.06      0.00      0.09      0.30      0.00     99.56
    07:40:01 AM       all      0.04      0.00      0.08      0.27      0.00     99.61
    07:50:01 AM       all      0.04      0.00      0.09      0.27      0.00     99.61
    08:00:01 AM       all      0.03      0.00      0.07      0.25      0.00     99.64
    08:10:01 AM       all      0.04      0.00      0.07      0.34      0.00     99.55
    08:20:01 AM       all      0.04      0.00      0.08      0.42      0.00     99.46
    08:30:01 AM       all      0.06      0.00      0.08      0.32      0.00     99.55
    08:40:01 AM       all      0.03      0.00      0.07      0.36      0.00     99.54
    08:50:01 AM       all      0.04      0.00      0.08      0.29      0.00     99.59
    09:00:01 AM       all      0.03      0.00      0.07      0.31      0.00     99.58
    09:10:01 AM       all      0.04      0.00      0.08      0.39      0.00     99.50
    09:20:01 AM       all      0.04      0.00      0.09      0.34      0.00     99.53
    09:30:01 AM       all      0.06      0.00      0.08      0.28      0.00     99.58
    09:40:01 AM       all      0.03      0.00      0.07      0.30      0.00     99.60
    09:50:01 AM       all      0.04      0.00      0.08      0.33      0.00     99.55
    10:00:01 AM       all      0.04      0.00      0.07      0.37      0.00     99.52
    10:10:01 AM       all      0.04      0.00      0.08      0.37      0.00     99.52
    10:20:01 AM       all      0.03      0.00      0.08      0.31      0.00     99.58
    10:30:01 AM       all      0.06      0.00      0.08      0.65      0.00     99.22
    10:40:01 AM       all      0.03      0.00      0.07      0.37      0.00     99.53
    10:50:01 AM       all      0.04      0.00      0.08      0.35      0.00     99.53
    11:00:01 AM       all      0.04      0.00      0.07      0.31      0.00     99.58
    
    11:00:01 AM       CPU     %user     %nice   %system   %iowait    %steal     %idle
    11:10:01 AM       all      0.03      0.00      0.08      0.34      0.00     99.55
    11:20:01 AM       all      0.04      0.00      0.09      0.29      0.00     99.58
    11:30:01 AM       all      0.06      0.00      0.08      0.37      0.00     99.48
    11:40:01 AM       all      0.04      0.00      0.08      0.29      0.00     99.60
    11:50:01 AM       all      0.04      0.00      0.09      0.35      0.00     99.53
    12:00:01 PM       all      0.04      0.00      0.08      0.25      0.00     99.64
    12:10:01 PM       all      0.04      0.00      0.08      0.39      0.00     99.50
    12:20:01 PM       all      0.03      0.00      0.08      0.30      0.00     99.58
    Average:          all      0.05      0.00      0.09      0.90      0.00     98.97
    
    


  • @g.jacobse said:

    @scottalanmiller said:

    @g.jacobse Why would updates from MS trigger updates? Are they running on HyperV?

    In this case, the DC reported there were updates available and a reboot was needed. so, while I have not verified it, all boxes were touched to ensure being up to date and restarted as needed.

    What is the connection between a DC (Windows VM) needing an update have with a PBX? There is some connection here that we aren't being told.



  • Based on your screenshots, there was no reboot. Who told you that they rebooted?



  • @scottalanmiller said:

    @g.jacobse said:

    @scottalanmiller said:

    @g.jacobse Why would updates from MS trigger updates? Are they running on HyperV?

    In this case, the DC reported there were updates available and a reboot was needed. so, while I have not verified it, all boxes were touched to ensure being up to date and restarted as needed.

    What is the connection between a DC (Windows VM) needing an update have with a PBX? There is some connection here that we aren't being told.

    No - DC and the PBX Host are different machines



  • @g.jacobse said:

    No - DC and the PBX Host are different machines

    I know. So why does anything with the DC mean anything with the PBX?



  • @scottalanmiller said:

    Based on your screenshots, there was no reboot. Who told you that they rebooted?

    Reboot would have been before 14:00 EDT yesterday