Thoughts on Verification: Doing Our Work in Regulated Industries
In this edition of “Thoughts on Verification”, Verilab consultant Jeff Montesano interviews fellow consultant Jeff Vance on verification in regulated industries. Jeff Vance has extensive verification experience in the regulated nuclear equipment industry. The discussion explains the role of regulators and how it can affect verification processes as well as interactions within the team. They also discuss the challenges and how innovation manifests in such an industry.
Jeff Montesano: Hi, everyone. Welcome to another edition of Thoughts on Verification. I’m pleased to have my colleague, Jeff Vance here with me to discuss his experience in working in regulated industries and how it would impact verification. Jeff, thanks for joining me.
Jeff Vance: Thanks. Happy to be here.
JM: So let’s talk a little bit about what would you think are the primary differences between working in regulated industries, such as nuclear and military, versus unregulated industries, where you’re making commercial products that might be going into cell phones and things like that.
JV: Yes. My experience is mostly in the nuclear industry, working on safety critical systems for the automation of nuclear power plants. There are a lot of differences working in that domain compared to most non-regulated industries. The biggest difference is you have a regulator such as the Nuclear Regulatory Commission (NRC) who has to approve the work you’re doing. So there’s a huge change to priorities. There’s a change to the daily work that you do, the mindset of the people and how the work is done. Ultimately, it’s not enough just to design your product and catch all your bugs. You have to prove to a regulator that you designed the correct thing, that it does what it’s supposed to do, and that you followed the correct process.
JM: I see, I believe we’ve covered something like this before with the aerospace industry. So you said there’s a difference in priorities, can you give me an example of what types of priorities would be different?
JV: I think the biggest difference is that you must define a process and prove that you followed it. That’s how you prove that the design has no defects. So even if you designed the perfect product and the verification team found all the bugs; there will still be an audit. They’re going to challenge you, and you’re going to have to prove that everything you did is correct. The primary way to do this is to define a process that the regulator agrees is good and create a lot of documentation that demonstrates you followed it. If you can prove that you followed that process throughout the entire life cycle of the product, that demonstrates to an auditor that your design is correct and can be used.
JM: Okay, who comes up with this process? The verification team?
JV: There are industry standards that have established what these processes should be through lessons learned over the years. For example, the NRC’s quality assurance criteria are defined in part of a Code of Federal Regulations (10 CFR 50, Appendix B). They also published Regulatory Guide 1.168 to give guidance on how to meet these regulations. This guide then endorses various IEEE standards such as IEEE-1012 for verification and validation. Manufacturers of components for nuclear power plants create their internal process to be consistent with these standards.
JM: Okay. So it seems to me that it’s based a lot on historical experiences. I’m trying to figure out, in the last ten years, verification has changed quite a bit. Constrained random is the name of the game.
JV: Right.
JM: How does this fit in with these more historical types of approaches?
JV: Yes, there is a big challenge using modern verification techniques in the nuclear industry. There’s been a long history of developing software-based designs. So all of the standards and procedures for how to design these systems are very much centered around software development. New systems are FPGA based, but the industry is still working in a software-based process. That makes it challenging to do proper verification using modern constrained random methods.
JM: Okay. In your experience, did you meet resistance when you were trying to introduce constrained random?
JV: Yes. It can be challenging to apply newer verification methodologies in an environment where there’s a lot of history of doing things a certain way. Once you’ve proven that a certain approach works, it’s usually recommended to stick with what works and not try something new like constrained random. It is very different from traditional software-based directed tests.
JM: Does it make sense to use constrained random in this domain?
JV: Constrained random verification is actually a perfect fit for safety-critical domains. This methodology allows us to generate random test cases within a given set of constraints. One of the biggest advantages of this is now there is a chance you will generate test cases that nobody thought about. This is a perfect fit because most of the standards are intended to anticipate human error and account for it in the process. There’s a lot of well established ways of dealing with this, such as having independent teams, having diverse technologies, and having redundancy in your systems.
JM: Which sounds a lot like what constrained random does for you.
JV: Yes. Constrained random verification is another way to avoid human error. If we happen to make mistakes in our verification plan and overlook some cases, this approach will still test those things.
JM: Totally agree. Switching gears, we’ve been talking a lot about high-level methodology and industry’s approach and a verification engineer’s general approach. Can you give me more of a day-to-day picture – is it any different? I mean you’ve worked in both safety-critical and unregulated commercial industries.
JV: Yeah, there are some differences in the day-to-day work. One difference is there are more formalities in the communication between people and sometimes limited access to information. It’s standard to have more separation between design and the verification teams.
For example, a lot of people are used to having casual conversations with other engineers to solve their problems and answer questions. That should be avoided in a safety-critical domain. The idea is there should be more independence between the verification team and the design team.
JM: If the verification engineer can’t talk to the designer, then where does he go when he has questions about the design?
JV: So rather than the verification engineer getting his information from the designer that person should be getting the information from specifications. The requirements spec is given to both the design and verification teams, and they’ll independently draw their interpretations of what the design should do. If there’s something that is not clear or they don’t understand, it becomes a bug that is filed in the bug-tracking system. The issue is resolved that way rather than a hallway discussion. The primary motivation there is now you have visibility of all these issues and how you solved them. You have proof that you followed the process and a third party can see the evidence.
JM: Are there times when you would want to have that conversation and the situation makes it impossible? Or is it the culture that you should not be having those conversations?
JV: It’s not like people are forbidden from speaking to each other. People know the limits of what can be discussed. Sometimes legitimate discussions of issues occur in meetings or through email. When that happens, the details of the discussion are written down, usually in a bug ticket.
JM: Okay. So this seems quite different on a day-to-day basis. I’m imagining that it even ends up manifesting itself in a whole other mindset. I wrote a paper last year on the difference in the mindset between a verifier and a designer. Would you say that this results in a different verification mindset?
JV: Yes, definitely. Yeah, I think one of the biggest points I liked about your paper on the verification mindset is how you do have to approach these problems from a different perspective than a designer would. Thinking about how the design should be tested and how it might break is different than how a designer might approach the problem. I think in these safety-critical domains it takes that idea to a whole new level. Because now it’s not enough just to think about those questions, but now you have to think “how can I prove that everything we did is correct?”
JM: How would you say this change in mindset manifests itself in your work?
JV: Well, now you have somebody looking over your shoulder. There’s always a chance someone is going to challenge you and how you did things. So I think that’s going to change how you write your tests and how you document your verification approach. It’s also going to change how you document your coverage because ultimately your coverage metrics are a very strong way that you prove that you did your job and covered everything.
JM: OK. I’m wondering to what level of detail the auditors get involved at. Would they know when your sampling moment is on a certain coverage group? Or when you show them a coverage report, is that sufficient?
JV: An audit does not typically go into extreme technical detail. What they do is they audit your proof of following the process and focus on a selection of important technical details as part of that. So they might want to see what kind of bugs you found and how they were resolved. They won’t review every bug that was found, but pick out some interesting ones. They might pick a few interesting requirements and trace them through the entire process down to implementation, verification, and coverage. If you demonstrate you did what you should have in a few technical areas, that gives confidence that everything else is OK. You demonstrated you followed the process and it is designed to catch all the bugs and produce a correct design.
JM: Okay. So now that we’ve talked a lot about the differences in mindset and day-to-day activities, having worked in both domains, have you found that bringing any of this to the non-safety-critical domain has helped you or hindered you? How has that been?
JV: I think there’s a lot of value that comes from working in a safety-critical domain. After a while you develop this conscientious mindset in all your work, and that is important for verification in any domain. You’re thinking more about what’s the proper way of doing things. But there are some environments that approach may not be quite as good a fit. It really depends on the type of industry and the type of requirements of the product. If you’re in a very fast-paced environment where you’re creating innovative products and you’re challenged with time-to-market, there’s not a lot of time for doing things really thoroughly. As we’ve covered previously in these interviews, you have to prioritize what’s proper with getting a working product done. Sometimes there’s just too much overhead if you’re always focused on doing things the perfect ideal way.
JM: So I guess what you’re saying is that you have to decide which aspects to drop, and there’s some that you will retain and actually derive some value from, no matter what work you’re doing.
JV: Yeah, absolutely. I think no matter what, there are things people can learn and pick up in a regulated industry that are very applicable and can be applied to any industry without too much overhead. Obviously you’re not going to be doing the same extent of documentation or requirements tracing. But there’s still ways about thinking about how your coverage should be defined.
JM: Very good. Is there anything else you’d like to discuss before we wrap up here?
JV: I think it is very interesting to experience the differences when transitioning to and from safety-critical domains. I think some engineers are a more natural fit in that environment than others. Some may not like it as much because there is less rapid innovation in the technology and it is more about making reliable systems that do more basic things. So when there is innovation in that area, it’s more on how can we make things safer or get them done faster.
JM: It’s at the process level, I would assume, more than at the algorithmic level?
JV: Yeah. But there are different kinds of rewards in working in that environment. If you take pride in following a process that ensures you’re producing quality engineering work, that can be just as strong a drive as rapid innovation.
JM: Yeah, I can imagine it’s a bit like being a civil engineer, where things haven’t changed much in the past several years, and yet a civil engineers probably gets a lot of satisfaction from building something that he knows is going to be safe for thousands and thousands of people on a daily basis.
JV: Yes, I think that’s a good comparison.
JM: Great, I think we’ll end it on that note. Thanks a lot for your time and the experience you’ve shared today.
JV: All right. Thanks.