Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Voice quality issues (Juniper SSG's)

    About 6 months ago we replaced our WatchGuard firewalls with Juniper SSG's (mulit site) and we have been having some voice quality issues since.

    Here is our setup.

    MPLS circuit between multiple sites, QOS implemented with DSCP tagging.
    DSCP tagging on Shoretel server turned on 184
    Policies on SSG's that add 46 tags to traffic coming from SG server as those packets are not tagged (Shoretel documentation)

    Here are some of our issues.

    119 events (excessive packet losses) from calls going across the WAN and sometimes the LAN.
    PCM will not respond when sitting idle for a while > 30 minutes, get invalid line handle on PCM, it will take about 45 seconds for the call manager to be functional again. (I have a case open with ST, they have had me to wireshark captures, etc, but can't find a problem)

    Out network is segregated into Voice and Data VLANs, with tagging turned on on the phones. The SSG is also the router, both the data and voice VLAN's are in the Trust Zone, we have a intra zone policy that is set to any any any, a Trust to MPLS zone that prioritizes the voice traffic as "real time".

    The switches we use our Foundry Gigabit (running in layer 2), with the SSG acting at the VLAN router. The ports going to the phones are configured as follows: (only pertinent info included)

    vlan 10 name Data-Default by port
    tagged ethe 0/1/1 to 0/1/2 ethe 0/1/7 ethe 0/1/9 to 0/1/48
    untagged ethe 0/1/6 ethe 0/1/8
    spanning-tree 802-1w
    !
    vlan 11 name Voice-Vlan by port
    tagged ethe 0/1/1 to 0/1/2 ethe 0/1/9 to 0/1/48
    untagged ethe 0/1/3 to 0/1/5
    spanning-tree 802-1w
    !
    interface ethernet 0/1/3
    port-name "Shoregear 90"
    no spanning-tree
    sflow-forwarding
    !
    interface ethernet 0/1/4
    port-name "Shoregear T-1"
    no spanning-tree
    sflow-forwarding
    !
    interface ethernet 0/1/5
    port-name "Shoretel Server"
    spanning-tree 802-1w admin-edge-port
    sflow-forwarding
    !
    interface ethernet 0/1/9
    dual-mode 10
    spanning-tree 802-1w admin-edge-port
    inline power
    voice-vlan 11
    sflow-forwarding
    !
    interface ethernet 0/1/10
    dual-mode 10
    spanning-tree 802-1w admin-edge-port
    inline power
    voice-vlan 11
    sflow-forwarding
    !
    interface ethernet 0/1/11
    dual-mode 10
    spanning-tree 802-1w admin-edge-port
    inline power
    voice-vlan 11
    sflow-forwarding
    !
    ......up to port 48

    SSG config (only pertinent lines)

    set interface "ethernet0/8.10" tag 10 zone "Trust" <--- Data
    set interface "ethernet0/8.11" tag 11 zone "Trust" <--- Voice
    set interface ethernet0/8.10 ip 10.10.1.1/24
    set interface ethernet0/8.10 route
    set interface ethernet0/8.11 ip 10.11.1.1/24
    set interface ethernet0/8.11 route

    set policy id 20 from "Trust" to "Trust" "Any" "Any" "ANY" permit log
    set policy id 20
    set log session-init
    exit
    set policy id 24 from "Trust" to "mpls" "Voice LAN" "Any" "VOIP->1025-UDP" permit log count traffic gbw 256 priority 0 mbw 1000 dscp enable value 46
    set policy id 24 application "IGNORE"
    set policy id 24
    set policy id 12 from "mpls" to "Trust" "Any" "Voice LAN" "VOIP->1025-UDP" permit log count traffic gbw 256 priority 0 mbw 1000 dscp enable value 46
    set policy id 12 application "IGNORE"
    set policy id 12
    exit

    We are running ST 8.1 13.23.6910, with a HQ server and a DVM server at location #2.

    Sample 119 error:

    Switch isgpri01pdx: Excessive number of packets lost from 10.11.2.50 (359 out of 19549).

    Is there something I am missing that would cause these random 119 errors, they don't appear during times of high bandwidth usage, and they even happen on the local lan. I have pretty much tried anything, any suggestions would be greatly appreciated.

    TIA.

    Derek
    Last edited by derekhill; 03-30-2009, 11:20 AM.

  • #2
    You are doing inter-vlan routing on your SSGs?!? What model (5, 140, 3xx, 5xx)?

    Are you doing VPN tunnels over the MPLS connection (trying to figure out why the SSG was implemented and not a J-Series router)?


    Are you using the Foundry (Brocade) GS switches?

    Comment


    • #3
      Chris,

      we are using a pair of SSG 140's at the HQ site running active/passive HA mode and a single SSG 20 at the DVM site. Between the 2 sites we do not have a VPN tunnel (we are using the MPLS link), however, we do have some VPN connections from the sites to other remote sites (there are no phones there). The main reason besides for the VPN tunnels we chose SSG's over J-Routers were that we needed AV, web filtering and DI for our policies as well as stateful inspection. This is what our Juniper SE recommended and we followed his advice.

      You are correct when you ask if we are using the SSG's for inter VLAN routing. At the HQ site, the Foundry FGS 448 POE connects to the GB ports on the 140's, at the DVM site, the Foundry is connected to the 10/100 ports. Both the Data and Voice are in the trust zone, we have 2 additional zones for wireless (public and private) as well as an mpls zone in addition to the untrust and vpn zone. We currently have 2 connections from our ISP (one public, and one private 10.x.x.x that connects us to the private side of the MPLS).

      Our edge routers are managed by our mpls provider/isp, we do not have access to those, but I have had so many tickets open with them to see if we are having QOS problems, we are not dropping any packets out of the EF queues and are seeing those queues increment, so I am assuming that it is working as it is supposed to work.

      My guess is that we have sessions time out, or the ALG portion of the voice protocol is not being passed correctly. I did make a change this afternoon on the voice -> mpls zone policy to ignore the application layer and this has seem to fix the 119 events, however, the callers are still complaining about low volume and cutting out of the call center agent, if the call comes in at the HQ site and the agent is at the DVM site. While I don't seem to have 119 events anymore (hoping it stays this way), I am now showing 3311 events immediately followed by 3310 events. These are not showing as warnings, but rather as informational events in the HQ server event logs. (3311 - DVM disconnected from HQ server, 3310 - reconnected).

      Your insights are greatly appreciated.

      Regards,

      Derek

      Comment


      • #4
        We've run into voice quality issues that required us to turn off spanning-tree on our voice vlan. Once that was turned off the voice quality greatly improved and the excessive packet loss went away.

        Comment


        • #5
          Thanks for the hint. I was under the impression that the spanning tree only gets turned off on the ports that connect to the ST switches, but not on the ports that go to phones and server. If you look at the config posted in the first post, that is the way it is today. Are you saying to turn off spanning tree on all the ports? If so, what effect will it have on the rest of the network?

          Derek

          Comment


          • #6
            Make sure its turned off with just your Voice VLAN/voice ethernet ports. You want to make sure Spanning Tree isn't interfering with your traffic on the voice side. You can keep Spanning-Tree running on your data side. Just keep in mind you want to keep your Voice VLAN as "untainted" as possible. It doesn't take alot to negatively impact voice.

            Comment


            • #7
              What have you defined this service ("VOIP->1025-UDP") as?

              Comment


              • #8
                I have turned off spanning tree on the VLAN11 (Voice VLAN), but have left it on VLAN10 (Data VLAN).

                As far as the service VOIP->1025-UDP, it is source any, destination port 1025 and up, protocol UDP. This is to address the shortcoming of the HQ server not tagging packets with DSCP values, in our case from the queues going to the agents. This is what ST had me do and I have also see lots of post referencing this in appnote ST-0130 pg 6.

                This policy is used to set the priority of traffic to highest, add the 46 dscp tag for our mpls provider qos and to set bandwidth guarantees and limits.

                I definitely appreciate all the helpful troubleshooting steps. More eyes are always better than a pair or two.

                I am going to see how these changes affect the voice quality and network for the next couple of days.

                If there any other suggestions, keep them coming, at this point I am willing to pretty much try anything, but only 1 or 2 changes at a time.\

                Derek

                Comment


                • #9
                  There are a few things in this model that are of concern:

                  The SSG (ScreenOS) is not a good candidate for QOS with VOIP. ScreenOS has always struggled to do it properly for real time traffic and DOES NOT prioritize based on the DSCP bit. ScreenOS can mark traffic with a DSCP bit. Even with the carrier prioritizing, it only does so on the order of the packets recieved (QOS in the cloud is important for shared infrastructure versus private point to points). Your firewalls are treating all traffic equally on the egress on each side. That policy is also marking ANY traffic in the UDP port range as EF traffic. That is not a good practice. Other non RTP Shoretel traffic is being marked as 46 (signaling for example on UDP 5440-5446 and any other traffic in UDP, which any steaming media typically is).

                  We've seen a lot of failed attemps to use SSGs (they will never do QOS like the routing platform does, especially as Juniper is continuing to port ScreenOS into JunOS) in voice deployments. They work well for perimeter protection to the Internet. JunOS Enhanced Services provides true stateful firewalling with ALG. You are correct that JunOS (except SRX) does not currently provide AV and WF.

                  Proper design (and your Juniper SE didn't explain this properly) would seperate your Firewall for Internet (then effectively your DI/WF/AV) from your private WAN infrastructure (this should be a J2320 on each side). If you need DI on the inside, you should be looking at an IDP product.

                  With your SSGs doing inter-vlan routing, you might be stressing the capabilities of the box, especially as you turn on DI, AV (big CPU cycle hog) and WF. Consequently, packets might be dropped as the SSG is attempting to process the packets through your zones.

                  Hopefully you've turned on QOS on your GS switches as well. In the GS line, it is pretty simple to turn on trusting of the DSCP bit (and it will auto map to the correct queue). You should REALLY move your inter-vlan routing on to one of the GS switches. They support doing light layer 3 (intervlan routing and static routes).

                  Until the QOS is resolved on the egress of each of your SSGs, you will still likely run into problems as congestion occurs. I'd recommend you split your SSGs off to just handle the Internet, move inter-vlan routing on to your GS, and implement J2320s for your MPLS connection between the sites. You can then write a proper ACL to prioritize traffic from the Shoretel server.

                  In Shoretel 9.0, proper DSCP marking now occurs from Director.

                  Comment


                  • #10
                    Chris,

                    thanks for your suggestions, I will work on a re-design of our infrastructure to see how we can make this work by using the FGS as a layer 3 switch or add some Junos routers to the mix to handle of the other duties. I will contact our Juniper SE to get some ideas from them as well.

                    Regards,

                    Derek

                    PS: When is ST 9.0 coming out....?

                    Comment


                    • #11
                      Let us know if you need some guidance, we are a Juniper Elite partner and hold the Implement Specialist designation.

                      Shorete 9.0 is now in Controlled Release.

                      Comment


                      • #12
                        Here is an update to date:

                        On 3/30, I turned off spanning tree for the entire Voice VLAN, this seems to have been the winning solution. Thanks dsirek for that suggestion, all the previous documentation I had read said to only turn off the spanning tree on the ports the ST switches are connected to. All of my 119 errors have gone away (knocking on wood).

                        In addition, I have configured the foundry switches to "trust dscp" - thanks Chris. This change was made on 4/1. One additional change that was made on 4/1 was to turn off all of the ALG's on the SSG 140 and SSG 20's. I was at a Juniper STRM class (great product by the way) and one of the SE's that was attending the class suggested I turn those off as he had seen lots of VOIP quality issues with those turned on.

                        We are still looking at throwing in a couple of JUNOS routers in the mix to move the inter VLAN routing off the SSG's, even though I do like to be able to put the different VLAN's into different zones for security purposes. I am working with our local SE to come up with a "best solution" that will not break the bank and not compromise our security.

                        Once again, a big thanks to all of you that had suggestions to contribute to the fix. This forum is great, I hope to be able to contribute my knowledge to solving others problems, I have been a ST customer with different customers since version 4 (blue Shoreline switches) and have implemented several of these systems, this was my first "multi site".

                        Comment

                        Working...
                        X