Thoughts on Verification: The Verification Mindset (Part 2 of 2)
In part 2, Verilab consultants Alex Melikian and Jeff Montesano continue their discussion on the topics covered in the “Verification Mind Games” paper Jeff co-authored and published at DVCon 2014. Part 1 can be viewed here.
Alex Melikian: So as your paper explains, and as we’ve determined in this conversation, one part of the verification mindset is to determine ‘what’ has to be verified. However there’s something else, something that involves taking a step back and asking: “Can a particular scenario happen? ” with the DUT. In your paper, you give the example of a design that implements the drawing of shapes, and a test scenario where this design is given the task of drawing a circle with a radius of value zero. Describe this particular situation for our readers.
Jeff Montesano: Yeah, this was an interesting case. It was a design that was responsible for drawing circles, and took in an input which was the radius. The designers at the time specified the range of radii that this design needed to handle. When we asked the question, “well, what happens if you input a radius of zero into it?”, the designers came back and said, “It’s invalid. You don’t need to test it. We didn’t design the thing to handle it, and there’s no point in testing it. ” And while some verifiers might stop there and say, “Well, if the designer thinks it’s not valid, we shouldn’t test it”, we decided to go ahead with this case anyway.
In fact, we were able to show that in the broader system, there were circles of radius zero being generated all the time, and so the design genuinely had to handle it. When we finally ran a test with zero radius, it resulted in the design hanging. So as it turned out, it was very important that we brought this up, in spite of what the designers had suggested to us.
AM: This looks like a case where the verification engineer has to be sort of an independent thinker, and consider what are all the possibilities that can be applied to the design, as opposed to just following only what the design spec spells out what is handled.
On the other hand, I can understand the counterargument to this. As always, time is limited, and the verification engineer must judge the verification requirements carefully. For example, they have to ask themselves: “Is it worth it to verify something that is invalid? Are we doing a case of garbage in, garbage out? Or is there a specific requirement detailing out that a specific condition cannot be applied?” This sounds all easy, but it’s really not.
Oftentimes, the verification engineer is the first one to try out all the possibilities or combinations of how a design feature is used, especially when constrained-random stimulus is applied. This can lead to particular cases of stimulus, some of which are outside the intention of how the feature is supposed to be used. Then again, it may lead to a corner case that is genuinely valid and it wasn’t accounted for in the design.
So there will be situations where the verification engineer may be in un-charted territory. It can be a fine line between invalid case or a corner case nobody thought of. However, I think going back to that initial question will really help: “What am I verifying here?” or in other words “Am I doing something that is of value to verify the design?”
JM: Totally agree with everything you’ve said there. There is a fine line, and it takes experience to determine what is useful to go after, and what is a waste of time. I can recall a client that I had to work with in the past, who apparently had been burned by some type of internal issue in post-silicon, and so they wanted verification on the parity between interfaces of the different RTL blocks. Now, if an ASIC has issues that cause it to be unable communicate properly within itself, it would never be able to pass the scan test - it just would be sorted into the garbage bin. And so a verifier should never try to verify parity between sub-blocks because it’s a waste of time. It turned out in this case we didn’t have a choice. But as expected, we never found a bug there.
AM: Moving on to another topic in your paper, you mention that testing a design in creative ways is a core competency of the verification mind. What are some of the creative ways that you’ve done verification?
JM: I can think of something that came up recently. I was verifying an interrupt handler, and it was a circuit of type ‘clear on write one’. I had a test running where I would generate the interrupt, read it, go write one, read it back, make sure it got cleared to zero, and I thought I was done. However, the question arose, well, “what would happen if you wrote a zero to those bits?” Now, at that point, I could’ve written a brand new test which would generate the interrupt, write a zero, give it a brand new name, do all that stuff.
But I thought up of a more creative way to do it. What I was able to do is take the existing test class, make an extension of it, and add a virtual function that was defined to be empty in the base class [the original test] but defined to write zeros to all the interrupt registers in the derived class [the new test]. The original test was then modified to call this virtual function right after having provoked all of the interrupts, and the rest of the base test would proceed exactly as before.
So all the checks from the initial test would fail if any of those writes with zeros had had any effect. And so with very few lines of code, I was able to leverage what I’d already done, and use virtual functions in a creative way.
AM: So you’re not only just creative in what you we’re testing, but also creative in how to reuse the checks and the tests.
JM: You bet. Sure.
AM: So, following this theme of creativity, another area where I think creativity is important is “debug-ability”. What I mean by “debug-ability” is a measure of how easy it is to debug something using the test bench, or VIP you’ve created. Some examples I’ve personally seen is generation of HTML files, to help visualize information or data processed in a simulation thus making it much easier for verification or designers to debug something. Developing tools like this become very useful in the case of designs that implement serial data protocols.
I think it should be mentioned that making debugging easier is part of our job of verifying. You touch on this in your paper and state that prioritizing ‘debug-ability’ is important, and that sufficient time should be allocated to develop tools like the one I mentioned. What about you? What have you seen?
JM: Again, I totally agree. Designers don’t really need to think about ‘debug-ability’ in their day-to-day work. They have other things to think about, right? They have to think about making their design meet the specification with sufficient performance, and with no bugs in it. Whereas when you’re writing code to do verification, debug-ability is right up there.
So one example of this is if you have bidirectional buses on a VIP. Now if you were to implement a single-bit bidirectional bus in your VIP, you could create an interface that has a single bit, and you can declare it as an “inout” in System Verilog, for example, and you’d be able to implement the protocol correctly. However, at the test bench level, if you had multiple instances of these, or one or more designs under test communicating with your VIP, you would never be able to tell who was driving and who was receiving, because your VIP only has one signal, and you’d probably have to resort to print statements at that point.
A better way to do it is to actually split out the bidirectional bus to a driver and a receiver at the VIP level, and that way, you can have visibility on whether the VIP is driving or receiving. In the paper, I give some code examples of how that can be done.
AM: That’s a good tip, tri-state buses can be tricky at times. One last topic I’d like to touch upon from your paper is the statement that the verification mindset should focus on coverage, not test cases. Can you elaborate on that?
JM: Sure. So we now have these amazing tools like constrained random verification. But it’s hard for us sometimes to break away from the tradition of writing a lot of tests. It turns out that the more tests we write, the less we’re making use of the power of these tools. Ideally, you would have one test case that randomizes, a configuration object, you’d have a whole lot of self-checking in your environment, and you’d just rerun that same test case as many times as you can. Thousands of times. And through that mechanism, you’d turn up a whole lot of bugs, and make full use of the tools.
AM: It sounds like you’re talking about an overuse of directed test cases instead of exploiting the constrained-random nature. I have seen this too, usually at the start of a project there’s a lot of pressure to generate test cases that are simple and straightforward … and I would have to say that it would be normal to expect this, because the initial version of the design only have a limited feature set.
However, it’s really important for verification engineers to keep in mind the ‘law of diminishing returns’, if I can borrow the term from the field of economics. What I mean by this is that there’s a point where using directed test cases will become decreasingly efficient during the course of the project because these directed test cases have decreasing chance of finding bugs. Eventually, the verification team has to let go of the directed approach, make the jump to the constrained-randomization approach and go on from there. I think we agree that it’s the only way to efficiently find undiscovered bugs.
JM: Yeah. Just to add to the topic of coverage – there’s an important point that is sometimes missed, which is that coverage and checks need to be almost married together. Reason being, you don’t do checks without coverage, and you don’t do coverage without checks. Why? Because if you do coverage without checks you might think you hit something, but you weren’t even checking at the time. It’s a false positive, and is in fact very dangerous. The opposite is if you were to do checks without coverage. In that case, you’re randomizing things and doing a bunch of checks, but you don’t know which cases actually occurred, especially in the type of “one test” approach that I’m describing here.
AM: Definitely. When constrained random coverage-driven verification is done, both the constrained random and the coverage parts have to be developed together. Coverage is the only indicator you can use to convince someone not familiar to constrained-random that a situation is being executed, therefore there’s no need to write a directed test. It definitely pays dividends to do the constrained random way and collect coverage as soon as you can, as opposed to trying to hit each and every situation one at a time.
I think that’s a good note to end on. Jeff, thanks again for joining me on this edition of ‘Thoughts on Verification’.
JM: Thank you, Alex. Take care.