Sign in to follow this  
Followers 0
Rod_Hackney

ControlLogix redundancy

3 posts in this topic

We traditionally utilize the Allen Bradley standard redundancy setup for ControlLogix hardware. We have had it proposed to go to standard non redundant setup with only redundant power supplies and the utilization of two network cards to provide network redundancy to our HMI. I wanted to get forums experience with the redundant systems. Thanks in advance for your input.

Share this post


Link to post
Share on other sites
A lot depends on the way your HMI handles dead links. I've done something like this with RSView32 and RSLinx Classic. My communications between the HMI and the controller is via an Alias Topic in RSLinx Classic that can switch between two different 1756-ENBT modules that are physically on different networks. The RSView32 Event Detector sits in the background monitoring the @IsPresent predefined item for the 1756-ENBT module route to the processor. If that ever goes to zero, RSView32 issues a NodeSwitch command to the Alias topic and the HMI starts communicating through the alternate topic and therefore the alternate network and 1756-ENBT. On this system, we consider an HMI comms failure to be a major issue and we start shutting down the process in an orderly way. Return to the original communications path is done with a manual NodeSwitch command. The main drawback to OPC Alias Topics in RSView32 is that you can't browse tags through the Alias topic; during development I did a lot of browsing through the Primary topic, then manually changing the Topic name once the tag was defined. I don't know how you might do this with RSLinx Enterprise and RSView SE, and I haven't tried to do it with other OPC clients other than RSView32. A bigger kettle of fish starts to boil when you consider redundant networking to I/O systems. ControlNet is the way to go. I've been through a bunch of discussions about how using multiple 1756-ENBT modules in each remote I/O chassis "should be easy". It's not. That's now how the network, the controllers, or the protocol works.

Share this post


Link to post
Share on other sites
First you have to ask yourself why you are doing redundancy at all. And what you are trying to protect yourself from. Second, even though you wouldn't normally look there, the documentation from Rockwell on using ControlLogix for safety purposes has a wealth of information about performance including failure probabilities and data to calculate the number of hours of expected operation before a "dangerous" failure (different from a "safe one" but useful nonetheless). A standard CLX processor has around 10^-9 reliability compared to a typical 80's vintage PLC which was in the 10^-5 to 10^-6 range. So it's already operating at several magnitudes of order better than before. The safety PLC (SIL 3) version contains 2 processors and is effectively a single unit with redundancy built in and has a reliability of 10^-12. However, there's a difference in terms of implementation goals here because that processor is not set up for fault tolerance. Instead it's safety. So if there is a discrepancy between either processor at any point of operation, the system goes into fail-safe mode and the processors are faulted out, just the opposite of reasons to use ControlLogix redundancy. However, the result shows the direction that it is going in. When calculating failure rates for safety purposes, wiring and communications is usually ignored because the failure rates are typically in the range of 10^-12 to 10^-15, so they simply don't result in any sort of meaningful result. Sensors are usually around 10^-5 to 10^-6 so redundancy there tends to be more important, and output actuators are almost always in the 10^-2 to 10^-3 range so usually the overall system reliability is limited by the actuators. Keep in mind that again, this is not looking at it from a fault tolerance point of view. It is strictly looking at the problem from a safety point of view...that the system works as expected when called upon to take steps to prevent an accident. But the basic data can still be used for other systems. You can get this kind of data from several sources including IEEE 500, the IEEE gold book, the AB ControlLogix safety documentation, the TUV database, OREDA, and others. However, these tend to be averages. For instance, the IEEE gold book is usually concerned with power distribution and the OREDA database deals with oil platforms out in the North Sea. However, you also have to color it with local data. For instance, although I would generally agree with the estimates about communications networks reliability almost absolutely WITHIN A PANEL or even within a small structure such as an MCC or "E-house", I strongly disagree with those results when it comes to running communications lines across heavy industrial plants such as steel mills, foundries, or mining facilities, which is the kind of place where I work. In those environments the communication system if allowed to age naturally can achieve 10^-12 to 10^-15 or probably even better. However, all the other activities in those environments usually tend to actually damage the cable (incinerate it or physical damage) due to an accident happening. So although reliability can be very good, depending on the exposure, it might be worse than an actuator! Based on the above results, what I find is that the PLC itself is so reliable that ControlLogix redundancy is a total waste of time and money. You are adding redundancy to the most reliable component of the system, so the overall net improvement is not even measureable. Couple that with the various version issues that they've had over time, the expense, and the fact that Controlnet issues are so difficult to troubleshoot, the net effect is that it's a waste of money. Power supplies are another matter. They do go bad. And AB publishes the reliability data. So especially if you have multiple power sources, this may indeed be well worth the money. Not sure if you properly protect the power supply itself with surge and harmonic protection whether this still holds true or not. Finally, we get to the cabling. As I said, communication modules are hardly ever an issue. It's damage to the cabling itself that most concerns me. So at a minimum, I'll write in some code to check communications and react accordingly. If there are long cable runs, the ideal situation is to use either redundant stars or ring networks (cheap), both of which allow you to have one cable damage failure with no loss of PLC connectivity. Both can be easily done by buying the right kind of switches and simply adding the additional cabling (not in the same conduit or run!). Multiple ENBT cards does not really gain you anything except additional complexity because ENBT cards do not have programming in them to automatically fail over like a switch running a ring protocol or RSTP. I'm not worried about the ENBT/switch link at all if it's in the same cabinet. Most industrial duty Ethernet hardware (not Cisco) is capable of triggering switch contacts or sending a status packet to indicate the network status so that you can alert someone to make repairs when a network connection is broken. Controlnet has this same capability, but troubleshooting problems with Controlnet is virtually impossible. If it's intermittent or a grounding issue, it's nearly impossible to find. You end up trying to use scope traces to maybe sort of catch it sometimes. In addition the cards are 2 to 3 times more expensive than the Ethernet version of the solution. The only solid advantage is that effectively the 2-module solution with Ethernet (ENBT attached to a switch which implements redundancy) becomes a 1-module solution (CNET card and no switch). You are limited to buses or a single ring with Controlnet. Dual rings and general topologies (RSTP with say a wireless backup link) are not possible. All this taken into account, you would be far better off concentrating on spending your money on redundant actuators, or sometimes sensors. Attacking something with a reliability rating in the neighborhood of 10^-5 is not going to get you very far if you have another part of the system with a reliability rating of 10^-2 (1000 times more likely to fail). If downtime is the issue, keep an extra processor (and ENBT) card on the shelf. Then you are simply limited by the MTTR (mean time to repair) for the most part.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0