Sign in to follow this  
Followers 0
Niteman9

Ethernet Networking

14 posts in this topic

I have a system which has 8 Controllogix 1756-L62 processors. Each cell also has between 2 to 5 PanelView Plus 1250's and one Pilz safety processor on the network. I am checking a heartbeat signal from one PLC to the next. So the firs PLC interlocks to the second the second to the first and third and so on. My heartbeat watchdog timers usually reset in under 100 msec. Occasional they will not reset for several seconds. The highest I have recorded is 9.8 seconds. The PLC IP's are 192.168.2.100, 192.168.3.100 through 192.168.9.100. They have a subnet of 255.255.240.0. Each PanelView is addressed as 192.168.2.101, 102, 103. For each PLC with a subnet mask of 255.255.255.0. The eight PLC are linked with a MOXA EDS-516A managed switch. So based on what I know the Panelviews for each cell can not communicate with the other PLC's. My company had a "Network Engineer" come and check it out and his suggestion was to put every thing on the same subnet by changing all IP's so the third numbers were all the same. I think this will increase traffic on the network and cause more issues. Anyone have any suggestions.

Share this post


Link to post
Share on other sites
Well I am definitely not a "network engineer" - with or without the quotes. I do know a little about TCP/IP addressing that may help. Unless your managed switch has been configured otherwise, all the PLC's are 'talking' to each other anyway. If fact they have to with the heartbeat setup. There would be no change by changing the 3rd and 4th numbers in the address in the way that you mention because of the current submask setting. Your current submask for the PLC's is set to 255.255.240.0 - that lets all of the PLCs communicate to each other regardless of the third number in the network address as long as it is 15 or lower. In a typical masking that most people are familiar with 255.255.255.0, all the devices have to have the first three numbers the same in order to communicate. When you change the third number as you have it, 255.255.240.0, then you open up variations in the third number also. In that situation, any device with the second number from 1-15 can communicate freely. The panelviews have a mask of 255.255.255.0 and therefore cannot see a PLC that does not have an address with the first three numbers matching (192.168.2.x in your example). I would suspect that the panelviews for each PLC have a third number that matches the PLC address. this all seems a strange way to isolate a system, but then again I am not a "network engineer." I would think that the managed switch would be the best way to manage communications and how the devices interact. You did not mention how the panelviews talk with the PLC's. Do they go through the same managed switch? This may shed some light. I will go out on a limb and say that I do not think that changing your addressing scheme will speed up your network. I think you need to sit down with the network engineer and have him explain (repeatedly if necessary) why the change he suggested would affect things positively.

Share this post


Link to post
Share on other sites
Several. 1. The masks simply define whether a particular node will attempt to send the packet directly to the device on the same Ethernet (using ARP to resolve the IP->MAC address), or send the packet directly to the gateway. They do not control packet traffic in any other way. So although the subnet masks are fairly creative, they don't do anything else useful. 2. Are there any other external links? If you have external broadcast traffic, you are getting bombarded. 3. Have you gone packet sniffing yet? 4. Do you have an IGMP querier in the system?

Share this post


Link to post
Share on other sites
Thanks for the replies all good info. The network is set up as follows. Each PLC panel has a unmanaged switch which connects the PLC the HMI's and the Safety PLC. Each of these unmanaged switches are connected to the managed switch. There is no other connections on the network other than laptops used to access the PLC's. The idea in setting up the network as it is was to allow the HMI's to only communicate with the one PLC. No I haven't gone to deep into the managed switch. The leader of this project is the one who set it up and he is no longer with the company. Also the whole network doesn't go down when we have a problem. Its one link between two PLC's. Also yesterday it only happened twice between two PLC and once between two other PLC in a 12 hour production run. One recorded 9.8 sec with the heartbeat off and the other was 5.2 sec. So far today the max time on any link is 380 msec. The network engineer they brought out it the IT guy from the office. Not sure he has any industrial experience.

Share this post


Link to post
Share on other sites
I think you need to look into the managed switch a little more. See how it is configured and see what kind of diagnostics it has available. That may be enough to track down the problem. If nothing comes up from that, I would try setting up a VLAN for each of my PLC's and the line coming in from the PV's and Pilz. You will also need an overlapping one for the PLC's. That would truly isolate the PV's and Pilz from the other PLC's. Also, is the problem just one PLC timing out or does it vary from PLC to PLC. Maybe there could be a hardware, configuration or cabling issue associated with that particular PLC.

Share this post


Link to post
Share on other sites
Without digging real deep into how large the plc systems are or the specifics of transactions and drivers I can offer some basics things to look at. RPI's or tag update times. Ensure that all items have update intervals that are not all set at the same times or multiples. Use prime numbers. Set items that don't need frequent updates to longer times. Default on I/o can be 20ms. Just because you can doesn't mean you should. I have had devices set at intervals and every couple of hours I had a comms storm and then everything locks up. Drivers. I have seen some third party drivers cause comms irregularities. In controller properties under the advanced tab. You can adjust the time slice the processor devotes to comms or use a fixed amount for comms. This needs to be adjusted and watched to ensure no adverse effects occur. I have used this in systems with large communications overhead and minor control requirements. Download some freeware like wireshark and monitor the packets and see what is causing you delays. It takes patience and then more patience. If you can break the network and watch the comms packets. Basic troubleshooting can apply by halving the network. You could be surprised that some unknown device is causing you grief. I have had operators plug in computers and ethernet loops into unmanaged switches cause all kinds of grief. Good luck

Share this post


Link to post
Share on other sites
Describe your "heartbeat" logic. I've seen dozens of different logic mechanisms used as "Heartbeats". I prefer to use the actual status of a Produced/Consumed connection between controllers, by using a GSV instruction to read the EntryStatus of a Module Object (the Module is the remote CPU). The only real "heartbeat" that I use is that I run an RSView derived tag that writes the System Seconds from my RSView stations down to a tag in the controller. On the controller side, I know that if the tag written to with the System Seconds value remains the same for more than 2 seconds that the RSView station has been disconnected or the program has stopped. For an idea of why some heartbeat logic doesn't work as expected, see this: http://xkcd.com/165/ Edited by Ken Roach

Share this post


Link to post
Share on other sites
OK, what you are describing is a case of jitter...you are getting responses but the time delay in which they happen is random and apparently to large to be acceptable. When dealing with Ethernet, there are 3 inter-related concepts to keep in mind: latency, throughput, and jitter. They are inter-related as in all 3 tend to rise and fall in unison, but they are not the same thing. Unfortunately, most network engineers ONLY deal in best effort networks where jitter and latency are not considered at all. PLC's are much more susceptible to problems with latency and jitter however. Throughput is raw packets per second. You can measure it at the wire level in terms of bandwidth, or bits/second. However, this does not take into account the considerable timing delays built into the Ethernet protocol itself. When those delays are considered, the size of the packets becomes far less important (especially on 100 Mbps links) and the total number of packets per second becomes the dominant concern. More importantly, although Ethernet switches and most PC's can handle upwards of 20,000 packets/second or more, the typical ENBT module can only handle about 5000 packets per second, and Powerflex drives can only handle about 600 packets per second. If you exceed what the Ethernet interface card can physically process, the NIC (hardware packet receiver) on the ENBT will easily receive the full bandwidth that Ethernet can dish out, but the processor will crash and cause random, unexplained behaviors similar to what you are describing. SOOO...keep in mind that throughput is one of the things that you need to be checking into. Second, there are connections. Except for UCMM (unconnected messaging mode), Ethernet/IP works by allocating connections at each end of a communication stream. Even if you use CIP MSG blocks, it still caches those connections. If you exceed the total number of connections that the device can handle, then it will start dropping those cached connections. This will cause random delays as the connections are dropped and reformed. Now, it may seem that UCMM may be a way out, and that it partially true. However, if you use UCMM, it buffers the packets and treats them with a fairly low priority over connections. All UCMM traffic also experiences delays that are a factor of both the amount of UCMM traffic AND the amount of connected traffic. Third, there is jitter. As throughput increases, eventually you will end up in a situation where two or more packets are bound for the same destination at the same time. If this happens, then all the packets except one are stored in buffers and wait until there is space to transmit the next packet. How the packets are selected when this happens can be controlled somewhat via the priority mechanisms that the switch offers, but it still happens. The overall effect is that the average delay in sending a packet increases because of the amount of waiting that goes on. But more importantly how much delay there is depends on a lot of variables and the overall effect is that the average delay increases, but also the delay on a per packet basis is random. And the spread in the random delay gets larger as the throughput increases. This is jitter. By itself it usually isn't all that noticeable except if for instance you are running VoIP (voice over IP) and start noticing stuttering and silences. For a PLC it is definitely noticeable if you are doing something high speed such as motion control. If the buffer space on the switch is exhausted, then it throws away the excess packets. With Ethernet/IP IO, it operates over UDP/IP and the result is that you lose some updates here and there. Not usually a problem except that you'll notice jitter massively increasing. In TCP/IP for PC's they monitor the packet loss. When a packet is lost, it is assumed that the error rates of most communications systems are so low that it is probably not a real error. Instead the assumption is that bandwidth is exhausted and the TCP/IP system automatically chokes back on the transmission rate. In the mean time timers will automatically attempt to resend the lost packet at a later time. Needless to say when this happens, the jitter is usually really bad because it takes a while for the retransmission to occur. This is all well and good in a normal PC environment. However for PLC IO, it's a disaster. The design of Ethernet/IP especially for something like IO where regular response times are expected is that jitter is bad. Retransmissions are bad. So with a PLC environment, the goal is to operate with NO bandwidth issues...this provides a deterministic (that is, delays are predictable) operating environment. Clearly if bandwidth is an issue, then you've run into this brick wall. You don't have to get there all at once though. All it takes is enough heavy traffic to push the switch to the point where it starts dropping packets. Now that we've covered all the horrible things that can happen (and probably are), you probably will notice how important it is to troubleshooting your network and optimize communications. Let's start with the basics: 1. Make sure all your connections are good. Remember...100 Mbps Ethernet operates with a bandwidth from down close to 100 Hz up to about 33 MHz, but the cabling is designed to pass up to 100 MHz and you need the extra bandwidth in reality to keep good clean waveforms (avoid noise issues). So remember...that's enough bandwidth to cover almost the entire broadcast radio spectrum plus shortwave plus several television stations (at least in the old VHF bands) plus a lot more. It is absolutely critical that you terminate your RJ-45 connectors properly because you are dealing with RF-level signals. Shoddy workmanship simply can't be allowed. Slamming things in doors isn't allowed either. Make sure there is also NO CAT 3 cabling. CAT 5E or better is preferred. There is no inherent advantage (and never will be) to using CAT 6 either. 1 Gbps Ethernet is designed for CAT 5E. 10 Gbps Ethernet works marginally on CAT 5E and no better on CAT 6...the upcoming CAT 7 standard is for 10 Gbps. 2. No hubs. I repeat, NO HUBS. No 10 Mbps switches either. Fix both of the above before going further. Ethernet cables smashed in doors are not acceptable. Pigtails hanging out of RJ-45 connectors aren't either. Fix all this before you do anything. Note that Panelview Plus boxes will literally crash and give you all kinds of grief just because of a bad connection. Their OPC drivers are junk. Nothing we can do about it though unless you switch brands to something more reliable. As long as you always have good connections, you won't have this issue. 3. Now, you need some tools to do an effective job: A. A laptop. Having one with two Ethernet ports is very useful as then you can maintain a regular connection with the second port strictly for packet capturing but this isn't strictly a necessity and I've never found it to be a major limitation having only one port since Ethernet troubleshooting sessions tend to be long but rare. B. Several Ethernet cables, preferably of different lengths. C. An RJ-45 crimping tool, several extra connectors, and a spare box of CAT 5E. D. You need at least one crossover cable. Mark it. Do not let it out of your sight. Do not let it get used for anything except direct PC-to-device connections when troubleshooting. E. You need a copy of wireshark. It's free, an AB recommends it. www.wireshark.org F. You need a copy of netscan (www.softperfect.com/products/networkscanner). Again, it's free. G. You need at least one small managed switch in your arsenal or all managed switches in your network. H. You need a copy of your IP address map of interest, and your particular IP numbering plan. First, begin by going to each of your Ethernet/IP devices via web browser. Look at the diagnostic data. Check the number of connections and check the packet per second rates. Make sure there are at least some spare packets for each so that you have room for troubleshooting for instance. This should give you an idea if you are truly having overload problems. If you are, there are several possible sources: 1. RPI's that are set very badly. They default to 10 ms. This is ridiculous in most instances...it equates to 100 packets per second PER IO card. See if you can't decrease these to reasonable values before doing much else. Human reaction times are about 350 ms at best for instance. There is no excuse for scanning push buttons at a rate faster than half of this or about 150 ms. Temperature readings are even slower and often 1000 ms RPI's are more than adequate. 2. Poorly configured HMI (Panelview) traffic. I actually suspect this for you. If you configure Panelviews in the usual way and simply access each tag as-is, then it will generate one packet (a request for data) for EVERY tag, and get one packet in response for EVERY tag. Each Panelview is independent so if they all refer to the same tag, you get the idea. There are two exceptions. If you group tags together into a single array or a UDT, then you get one packet for the entire GROUP. The second exception is a "tag server". If you use a PC running RS-Linx Gateway and point all the HMI's at the PC, then the PC will grab the tags from the PLC ONCE and the HMI's will get it from the PC as needed. 3. Multicast traffic turns into broadcast. Decide if you have multicast traffic first. If you use producer/consumer tags or remote IO, then you have multicast traffic. If you don't manage it, it turns into broadcast traffic. The way that multicast works is that a special set of IP addresses are allocated as multicast groups. Multicast producers send packets with a multicast group as the desination IP address as the destination. Multicast receivers receive packets by monitoring for the IP addresses that they are interested in. There can be as many or as few multicast receivers as needed (even zero). This is different from broadcast packets that are sent to ALL receivers on the same LAN (or VLAN), or unicast packets that are sent to one and only one receiver. Thus multicast can act like zerocast, unicast, broadcast, or anything in between. We need to stop here a little and talk about HOW packets are routed on Ethernet. When a switch first powers up, it monitors all incoming ports for packets. When it receives a packet, it takes that packet and then sends one copy out EVERY port on the switch except the one it was received on. Since Ethernet is logically (although not necessarily physically) organized as a tree, the packet will be broadcast in short order to every port in the network. Now, this is hardly very optimized. So the second thing the switch does is to LEARN addresses. When it receives a packet, it also stores the source MAC address and the port number in a table. Then the next time it receives a packet, it looks at the destination. If the destination is a broadcast, then it does the normal thing. However if the packet is unicast, then it checks for the MAC address in the table. If it is present, then instead of broadcasting the packet, the switch sends it ONLY to the port that contains the destination MAC. As long as there is two-way traffic (that is, a device both sends and receives), the switch will eventually learn all the MAC addresses and properly route the packets as unicasted traffic. Now when this was all originally developed, multicast wasn't really thought of. So when the switch sees multicast traffic if it is not designed to route it (or hasn't learned a route), then it simply treats it as broadcast. The multicast destination is purposely marked with special bogus destination MAC addresses that do not exist and since no device ever sends a multicast packet with the multicast address as a SOURCE, non-multicast aware switches will simply broadcast. Routing (however inefficient) still works. In order to LEARN the proper multicast routing, a special IP protocol, IGMP is used. Periodically one device on the network broadcasts an IGMP Query packet. All multicast receivers on the network respond with an IGMP Report packet with a list of the IP addresses that they are receiving. The actual protocol is more complicated than this (it includes join/leave messages as of version 2, and it has methods for breaking down the list of addresses), but the basic protocol is exactly as I described it. The IGMP querier is not really involved in the process in any way except for providing the IGMP Query beacons. Multicast aware switches passively (or actively) monitor for IGMP Report packets. When they see one, they use the information in the packet in almost exactly the same way as the passive unicast snooping mechanism to build a table of multicast destinations and ports. Simple passive monitoring is called IGMP snooping. IGMP is a "layer 3" (IP) protocol but many industrial unmanaged (dumb) switches support IGMP snooping even though they are technically only "layer 2" devices. With active snooping, the switch will receive the IGMP Query and then broadcast it's own IGMP queries downstream. It aggregates the results and sends it's own IGMP Report back. This reduces the amount of IGMP query/report traffic. With that in mind, if you have mutlicast traffic and you have a lot of non-snooping switches (or it's not configured), or if you do not have an IGMP Query generator, all of your IGMP traffic turns into broadcast traffic. Since you have 8 PLC's if you are using producer/consumer tags for your heart beats each one produces one tag for EVERY PLC, not just the intended receivers. If you also have remote IO, this can very quickly turn into a total morass of packets which can cause chaos on not only your PLC's but in your IO as well. You CAN get away with using unmanaged switches especially if you have snooping capability on them, but it takes a little more management on your part to keep it under control. This problem should be obvious if you add up the number of packets on each connection (from the web pages of the Ethernet interfaces) but things just don't add up. It will be even more obvious with Wireshark. You can use wireshark for some generic things. For instance, plugging it in anywhere in your network and scanning for IGMP traffic should show the expected query/report traffic. But to truly take advantage of it, you need to have a small managed switch or use the one that you have. Configure port mirroring to mirror all the traffic to/from a particular port of interest and scan away. The statistics modes are almost as useful as the individual packets.

Share this post


Link to post
Share on other sites
Paul! Man! Dude! Wow! Not just a response - that was a whole ethernet dissertation! Once again a great post. Thanks for the education. I think I need to hook up and upload your brain. Not sure I have the available memory though... thanks, russell

Share this post


Link to post
Share on other sites
Paul, Thanks for all the information. It is more then I ever expected. I am still processing all of it. All of the PLC's RP!'s were set at 20 msec. I increased each to 150 msec. We are not sending much data from PLC to PLC. We are only sending one UDT to each processor up and down stream. So 2 UDT is the max we are sending over Ethernet. When I changed the RPI's my heartbeat watchdogs went from about 50 msec to 200 msec which was expected. As for the HMI we are using UDT's. I checked all of the EN2T/B cards and all are under 300 PPS with the limit being 20000 PPS so I do not think we are overloading the EN2T/B cards. The thing I am have the most trouble understanding is we will go 3 or 4 hours with watchdog timers never going above 500 to 700 msec then get one occurrence of 4, 5, 6 sec. Then no issues for the next few hours. Our IT department tested all of the cables which connect the PLC's panels with the Central Managed switch. All tested OK.I feel it is something in the central managed switch which is causing the problem but have no proof of this.

Share this post


Link to post
Share on other sites
I'll ask again: How, exactly, do your "watchdogs" work ?

Share this post


Link to post
Share on other sites
There are two types of CAT 5E cable testers. One is an OTDR and actually measures termination quality. It will give you information such as NEXT and FEXT statistics. Usually they are logging types and used specifically to document cable installation quality on large contractors. The cheapest models are also in excess of $5,000 US. The other type simply does ohms tests and verifies that all the pins are terminated the right way, and you can buy them for under $200 US. The first type is the only one that gives any sort of useful information, but it costs a fortune. Visual examination of the cable terminations is just as good in 99.9% of all cases. The second one is worthless because you can get the same information just plugging in a device and verifying whether or not the cable "works". It's only good as a quick check for an electrician to verify that all the pins seated correctly and in the right order. There is no measurement of "quality"...it's a go/no-go test that duplicates what the NIC cards are going to give you anyway. So be suspicious of anyone that "tests" a cable for quality. Only one of those tests gives you a quality indication which can separate marginal situations from good ones. The other test only detects bad cabling but is not superior to the internal Ethernet diagnostic counters built into your communication cards.

Share this post


Link to post
Share on other sites
Great info!! Just wanted to mention that processor firmware version 18 adds the capability to set up unicast connections. I'm not sure if the PV's support this yet, but the processors definitely do! Might be worth the upgrade ??

Share this post


Link to post
Share on other sites
Yes, it is worth the upgrade for specific situations. No, it won't affect Panelview's at all. The Ethernet/IP protocol itself does not force you to use unicasting or multicasting. It also doesn't force you to use UDP or TCP. Any combination of the 3 will work but obviously some combinations are not appropriate (TCP + multicasting doesn't make sense). All AB IO has implemented all options. The advantage of multicasting is to be able to broadcast data to two or more devices simultaneously while minimizing load on the source device because it only has to transmit a single packet. The main disadvantage is that if your network does not properly implement multicasting, the traffic turns into broadcasting and it can easily overwhelm the system and cause problems. If you've done that correctly, then the differences between multicasting and unicasting between a single source and a single destination are not important for all practical purposes. How can you eliminate multicasting? Simple: 1. No producer/consumer tags. CIP MSG'ing is OK. 2. No remote IO over Ethernet. And if you can set the unicast bit (they've been talking about this since Rev 16), there is a third option: 3. If you have remote IO, then you cannot share IO between two PLC's (no "listener" type connections) and you must check the "unicasting" check box. Note that with option 3 available, 99% of all PLC installations that I've ever encountered would work. Multicasting is very slick, but it simply isn't needed in most cases. The reason that Panelview Plus doesn't matter is that RS-Linx Enterprise, RS-Linx Classic, and Kepware OPC servers all poll the PLC using CIP MSG'ing. They don't use producer/consumer tags, so they don't multicast at all. It's a bit of a shame because the OPC protocol is ideally suited to utilize it because OPC supports the "change of state" update model.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0