Sign in to follow this  
Followers 0
PLC_man_Stan

Control Logix Ethernet Network

19 posts in this topic

We have an Ethernet Network in our plant which all machines are connected to. Each Machine has around 20 nodes on it ranging from Flex I/O to a SCADA PC. At Present each node on each Machine has an IP address ranging from 10.156.1.1 to 10.156.14.77 and all subnet masks are set to 255.255.0.0. This currently means that every node on this network can communicate with each other. We had an issue last week where an input card faulted and dropped out the whole network we lost connections to every node on the network it was a good day to be the Controls Engineer on site. I fixed the problem by unplugging the faulty machines main switch from the network. Anyway my issue now is that i would like to give each machine its own subnet with an extra 1756-ENBT module in each rack which will then communicate to the engineerign network. This will hopfeully isolate the machine subnet which will continue to communicate (to the 20 or so nodes) even if teh engineering network dies. I have set up a rig in my office with 2 1756 racks each with an L61 and 2 ENBT modules. I also have 2 Flex racks. I want to have 1 ENBT module talking to the flex I/O and 1 ENBT module talkign to the engineering network this should give me comms to my CPU from the engineering network and if i loose the engineering network the flex I/O will still talk to teh CPU over the other ENBT. Does anyone else have plant setup in this way and are there any rules i should stick too when assignign IP adresses/subnet masks

Share this post


Link to post
Share on other sites
I'm sort of shooting in the dark here and may be totally of base, but I'm going to say it anyway.... What is your net setup right now? I can't think of the correct term, but is your network set up in parallel (like a spider web) or series (ring)? If it's a ring network, see how one node falling out will drop the entire network? This depends on how you unplugged the faulted node. Unplugged and reconnected the cable to the network or just unplugged and left it hanging? Again, shooting in the dark. I haven't done it enough to do real troubleshooting without being there to see your system.

Share this post


Link to post
Share on other sites
So basically, a failing I/O module caused your 1756-ENBT (in the same rack) to go haywire on the network dropping out all 20 machines on the network. WOW! I'm not sure adding an extra 1756-ENBT would be better. What makes you think the failing I/O module wouldn't have caused both 1756-ENBT's to do the same thing (assuming you go to that config)? An alternative might be to give each machine a 20+ port switch and give each machine its own subnet. This would have narrowed your issue down to just the one machine while keeping the rest of your network(s) alive.

Share this post


Link to post
Share on other sites
I am curious what kind of topology in your network? and what switches or routers? why the faulted IO can kill the whole network...

Share this post


Link to post
Share on other sites
Thanks for the responses. At present we have a Hirschmann 1Gb hiper ring and each machine (group of 20+ nodes with various spider 5tx style switches) has a single link onto that ring. This is why when i unplugged that single connection onto the ring the ring corrected itself. Going back to the 2 ENBT cards i did some tests last night and it did work. As i only have a single CAT5 from each group of nodes onto the hyper ring i'm going to plug that CAT 5 into an ENBT module set as 10.156.13.7 subnet 255.255.0.0 which is in the CPU rack this means anything on my hyper ring can talk to the processor. Then i set the other ENBT module to a 10.156.50.1 subnet 255.255.255.192 and then each node with the same subnet and 10.156.50.x (upto 62). This means from the hyper ring in can see 10.156.13.7 and get online to the processor but i cant see any of the other nodes (to do this i would have to bridge the ENBT modules) so this is a good thing. I set two systems up in the same manner one using 10.156.50.X for its nodes and one using 10.156.51.X fo its nodes. I still have my rouge input card so i plugged it into one of the racks. I then witnessed the CPU in that rack error the two ENBT modules crash and every node on the 10.156.50.x subnet loose comms. I slo lost comms on the second ENBT module set to 10.156.13.7 which still did crash my hyper ring. But the main point is that the other subnet on 10.156.51.X stayed in tact as there was no bridge between the two ENBT modules. Ok so i think i have my answer to curing my machine level problem as this is the most important because it has all of the control level devices on it. Do people see any merits in changing to Control Net? I'll get some info on our hyper ring today and post it up so people can see, but i dont think its ahythign special just a bunch of managed switches with drops off it

Share this post


Link to post
Share on other sites
1. I would not switch to ControlNet over Ethernet/IP. I do not see how you can justify the ROI. 2. Have you nailed down the root-cause of the initial network failure? I would venture that either your Hirschmann ring failed to auto-negotiate a new path or the failed network module was bombarding the network with broadband traffic. Have you contacted your local Hirschmann rep to get their input? They are the best control system network hardware manufacturer that I am aware of. This type of failure is huge and something that their equipment should not be doing at all. There should be a concrete reason for the system failure. 3. Dig in, work with your equipment manufacturers and suppliers (A-B, Hirschmann) to insure you are networking the equipment to the manufacturer's recommendations. Next, take a look at adding some diagnostic handshaking to monitor network performance (heartbeat monitoring).

Share this post


Link to post
Share on other sites
So his network has this capability? It seems your suggestion of bombardment is the most plausible. With one node going down, I can see it taking a few out if a new path failed, but not the entire ring.

Share this post


Link to post
Share on other sites
I agree with IamJon and kaiser_will, my understanding of the purpose of ring topography is to give a second path for the network path. This in a sense gives you redundancy in the network. This is a big problem and as Kaiser stated you should get the hardware manufacturers a call. I still can’t see why the whole network went down.

Share this post


Link to post
Share on other sites
There are really two topics here; HiperRing and this "rogue input module". First, let's talk about HiperRing. The basic functionality is that it's a fast self-healing ring protocol, functionally similar to the token ring diagram above (but obviously it's not Token Ring). Spanning-Tree protocol and Rapid Spanning Tree Protocol are the most common ring protocols, and most Ethernet vendors have their own flavor of ring protocol, like N-Tron's NRING and Cisco's Resilient Ethernet Protocol. HiperRing is very good for enterprise networking and many automation networking systems, but in too many cases I see it described and sold and relied upon as a "redundancy" solution. I've seen various ring-healing times cited, usually in the 50 to 300 millisecond range. My own experience with a HiperRing of this scope is that ring healing times in the 300 to 500 millisecond range are typical. This is very fast for a supervisory network. It is almost certainly not fast enough for an automation network. ControlLogix I/O Rack, Module, or Produced/Consumed Tag connections on EtherNet/IP time out at 4x the RPI value, with a 100 millisecond minimum timeout. In your system, it sounds like the I/O connections are local to the SPIDER switches and don't cross the HiperRing links between the managed switches (are these RS20's ?). Even in the event of a ring failure or during a ring healing process, those local connections should not fail. Connections that cross between the ring switches will likely fail during a ring break/heal incident. But you cited an alarming and unusual failure involving an Input module... that's a separate discussion.

Share this post


Link to post
Share on other sites
That sounds really, really strange. A damaged or malfunctioning 1756 series input module should at most be able to "bring down the chassis" and cause faults or crashes in the devices connected to the 1756 chassis. If it caused a shutdown of the power supply, for example, or cause the ControlBus to fail completely, it could fault every module in the chassis, including the controller and the Ethernet modules. But I can't see how an I/O module in a 1756 series chassis could cause network traffic or other disruption on an unrelated Ethernet link that would "crash" an Ethernet switch or the attached ring network. There are a lot of isolators involved (the 1500 volt backplane isolator on the 1756 module, plus the isolation transformers on the Ethernet link) but there's still an electrical connection. A local 1756 I/O module shouldn't cause Ethernet traffic at all. In tech support, I'd be peppering you with questions, so here goes: 1. What is the exact model number of Input module ? 2. Is the connection between the Input module and the ControlLogix local (backplane) or over Ethernet ? 3. Is the "Major Fault if I/O Connection Is Lost At Runtime" box for that connection checked ? 4. What is the RPI of that connection ? 5. Are there any Listen Only or other remote connections to that module ? 6. When the CPU crashes, exactly what are the states of all of the LEDs on the CPU ? Did you capture a fault code ? If the CPU has v17 firmware, is there a crash log file on the CF card ? 7. When the 1756-ENBT modules crash, exactly what is displayed on their dot matrix displays ? What is the state of each LED on the module ? 8. When the connected Hirschmann switch "crashes", exactly what symptoms to you see ? 9. How is the system restored to correct function ? List events like power cycles and the devices involved in them.

Share this post


Link to post
Share on other sites
It's easy. Try troubleshooting an Arcnet/Controlnet system just once and you'll find out how to justify the ROI. The cost of buying enough oscilloscopes alone should do it. Oh wait, you were talking about going the other way. There are substantial advantages of Arcnet over Ethernet (err, Controlnet vs. Ethernet/IP): 1. Almost every PC vendor walked away from it in the 80's. These days the NIC cards are outrageously expensive and rare. This means that for vendors, the profit margins for Controlnet are substantial over selling commodity NIC cards for Ethernet. 2. Controlnet can use coaxial cable which is more expensive than twisted pair. Again, score one for the vendor. 3. Controlnet does not have any of the troubleshooting tools and ease of network topology configuration that you get with Ethernet. Among other things, frequently you end up using oscilloscopes to try to troubleshoot the old "thin Ethernet" (coaxial) cabling since cable and termination issues (reflections) are so much easier to cause with Controlnet. So it's great if you are running a service company because you can charge a fortune troubleshooting a niche product like this. 4. Controlnet requires a special software program to configure the network instead of being built into the Ethernet topology. So ordinary end users (and IT) can't mess with your network. 5. Half duplex Ethernet or operating an Ethernet beyond system capacity without IGMP control causes Ethernet to be nondeterministic whereas this is not possible to actually achieve with Controlnet. So you can protect end users from the topology and configuration flexibility of Ethernet/IP...basically, users "can't screw it up" (for better or worse). Funny how AB only talks about this last item as an "advantage". Edited by paulengr

Share this post


Link to post
Share on other sites
A single node can generate up to 20,000 packets/second (roughly the limit of a 100 Mbps connection). The NIC cards are capable of supporting this level of transmission AND reception. Ethernet switches and PC hardware and software can handle this level of sustained communications. However, all PLC hardware regardless of manufacturer is not nearly so well endowed with multicore Pentium processors and the like. ENBT's and FlexIO AENT's are limited to processing about 5,000 packets per second, and Powerflex drives are limited to less than a thousand packets per second. These capacities are true regardless of whether the node is the intended receiver or NOT. Thus, it is absolutely imperative to recognize and manage the packet per second rate at each node. If you exceed this amount, the behavior of the corresponding hardware is unpredictable and often results in dropped connections, lockups, and the like. With this in mind, the first step is to control broadcast traffic. Broadcast traffic of any significance comes principally from two sources. First, if you've got PC's on your network, they are all a bunch of chatty Kathy's. Best approach is to at a minimum, separate your office LAN and your controls LAN. You can physically wire them this way, but it's just as easy to assign them to different VLAN's. This doesn't create a true wall, but the effect is the same...your controls network won't see the broadcast traffic from the office LAN. The second, much more insidious source is the Ethernet/IP devices themselves. Every "producer" in your network, whether it's producer/consumer data sharing between PLC's or IO generates packets at the specified RPI as "multicast". If you do not set this up correctly, it is sent as broadcast traffic. For example, let's suppose we have 3 racks of remote I/O. All the digital IO uses rack optimized IO but there are also 3 analog cards in each rack. Let's suppose that everything operates at 20 ms RPI's, or 50 packets/second. Now the rack optimized IO uses 3 packets (1 configuration, 1 output, 1 input) per rack, or 3x3x50 = 450 packets per second. The analog IO creates this PER IO CARD, or 450 x 9 = 4050 packets per second. So before considering any unicasted traffic, broadcasts, PC's running Logix 5000, etc., we already have 450+4050 = 4500 packets per second of traffic, for a relatively small amount of IO. If you have a switch or router generating IGMP queries and all your switches have IGMP snooping capability, then the above totals drop dramatically. At the PLC itself (if you have only one), it will truly see 4500 packets per second and there's not much that can be done about that aside from increasing the RPI's to what the application actually calls for. Frequently analog cards can be increased to between 100 ms and 1000 ms RPI's, especially with temperature loops that are comparitively slow anyways. BUT, in your scenario, the considerable load from multicasted traffic if it is broadcast is easily enough to cause what you described. It is also relatively easy to check for this. Simply load up wireshark (a free software application, google it) and monitor your network with a PC. If you don't see IGMP query packets, you don't have any and that's probably a big part of your problem. Not doing this is fatal in almost all cases as your network grows. There is a second possible reason though that a single node can "take out" a network. Though rare, if a node "crashes", sometimes in the past it has not been unheard of to hear of nodes generating at maximum rate (20K packets/second) and simply spamming the network. Even with proper IGMP multicasting controls in place, all of your hardware is in danger of this happening. There are essentially four solutions to solving this. First, you can create separate networks to localize the damage. Second, you can use VLAN's to accomplish the same thing. VLAN's also allow you to be much more flexible in how it is handled. For instance, you can program overlapping VLAN's at the PLC's to isolate the IO cards or drives while still having lots of flexibility with troubleshooting ports as well as not needing to buy lots of ENBT cards at the PLC's. Third, you can set up QoS (traffic prioritization) which is even better. So for instance, you could specify that when a switch has to pick and choose among competing traffic flows, it could allocate say 75% of the traffic flows to IO, and 25% to HMI's or troubleshooting PC's, gauranteeing that IO takes precedent. Finally, you can set up rate limiting at each device. Rate limiting attacks the basic problem that devices have limited receiving throughputs. Although prioritization and other techniques definitely help, rate limiting the amount of traffic at a given node is the one true way to stop a node from "spamming" the entire system and causing all your PLC hardware nodes to crash due to overload.

Share this post


Link to post
Share on other sites
Thanks Ken, you are correct in your assumption that the switches are RS20's and that the Flex I/O for each machien should not have to cross the hyper ring.

Share this post


Link to post
Share on other sites
1. 1756-IB16/A part No. 96258877 A01 2. The Connection is local the card sits in slot 7 of a 13 slot rack which also includes the CPU and an ENBT (this ENBT connects to an 8 port switch which has a cat 5 to it from one of the managed switches on the hyper ring) 3. No it's not as standard we do not have this box checked (there is probably no ryme or reason for this other than that is the default setting) 4. Not sure what you mean as the card is in the chassis 5. Not to this module obviously there are other connections through the ENBT card 6. We don't currently use CF cards (although i am changing this). Fault lights - OK light on solid red no fault code captured as kicked off network when card is placed in rack 7. OK light solid red NET light flashing red 8. This is a tough one to answer but i think it goes like this Power off Chassis containing Rouge I/O card, CPU and ENBT Remove Rouge I/O Card Power on Chassis Try and communicate with processor ENBT will have reverted back to bootp so set network address Once ENBT has comms try going online with processor CPU will require firmware reload and then program download Then we are up and running.

Share this post


Link to post
Share on other sites
Firmware is stored in something firm...EEPROM. If you wiped out the EEPROM, then the processor "bricks" (junk to anyone but AB). How/why did you manage to do a firmware flash? If this is standard practice, you do realize that EEPROM has a finite amount of writes before it is "used up" and that there are huge risks (possibility of bricking a device) every time you do it? And that you should never, ever do it across a network except if you physically isolate (unplug everything but switch, PLC, and PC, or use a crossover cable)?

Share this post


Link to post
Share on other sites
1 - I know its the 1756-IB16 causing the issue as i replicated it on my desk by getting a 13 slot rack up and running and then plugging the faulty module in. (see previous post's) 2 - ControlLogix CPU's are shipped without the firmware in them so the first thing you always do is update the firmware using control flash. This is the exact same way i updated the firmware when it asked me to do so after faultng the system.

Share this post


Link to post
Share on other sites
Just my humble 2 cents. We have a number of Controllogix systems with Safety I/O, Drives I/O and Scada PC access. We've found using three ethernet cards in each logix rack while costly up front fairly bullet proof and safer in the long haul. We use firwt ethernet card for Scada I/O, Second Ethernet card for Drives and I/O modules and Third Ethernet card for Safety i/O. We obviously have three physically seperate neworks in this scenario. Color coding of Cables is a must. But we've not had issues with I/O hosing scada or safety, Again just my humble 2 cents.

Share this post


Link to post
Share on other sites
Just a skim through and I have this question along the same lines... Do we know if it's an IO card fault or a SLOT fault? Was a different IO card used in the same slot, or the faulted card used in another slot, maybe in another chassis? And yeah, firmware reload... I'd replace the PLC as Paul suggested. What made you think you needed a firmware reload? I've had a PLC go stupid on me once, couldn't d/l to it. I started a fresh, blank program with only the correct PLC (SLC) put into it, and downloaded that, then I was able to re-download my program and get it running.

Share this post


Link to post
Share on other sites
This is my plan going forward it seems the only way to ensure no cross network issues, Thanks Bob its good to hear someone else is doing it this way

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0