Saturday, November 21, 2009

Advice on Process Performance Models

We have developed a Process Performance Model (PPM) for system testing defects. In our model we analyzed multiple factors (independent variables) that may influence the number of system test defects. However, the strange part is that the regression analysis has eliminated all of the variables except one, testing effort. Some of the other variables included things like number of test cases, tester's experience, and size (LOC).

Can a valid PPM only have one independent variable? The p-value is less than 0.05, and the R squared value is 0.8, so it seems that it would be reliable. But only one independent variable seems rather limited and weak to me, but I can't think of a justifiable reason why it is invalid considering we have analyzed data from several independent variables and only one was found to be statistically significant.

What do you think?

It does sound a bit odd. Perhaps you have overlooked some other X Factor that you should be evaluating. But it is conceivable to have a model with only one independent variable.

Basically your model looks like Y = mX +b. But given all of the PPMs that I have seen, they all contain multiple X factors.

Several things could be going on here.

1. The regression analysis was faulty

2. The list of X factors evaluated was incomplete

3. One or more of the X factors were a combination of factors that should have been separately evaluated

4. Not enough data were evaluated for the regression analysis to yield proper answers

5. The distribution of the data is not a normal distribution (Gaussian distribution aka Bell Shaped Curve) and analytic techniques for a normal distribution were mistakenly used

You probably should have a statistical expert independently examine your analysis to see if the resulting PPM is in fact a linear relationship between X and Y.

The other option is to use the PPM for several months of analysis and see how accurate it is. If the PPM yields accurate predictions, then it is most likely a good PPM.

One thing that can happen if you have not been careful with storing the raw data and/or not collecting at the proper level of detail is that factors can be inadvertently combined and thus cancel each other out thereby yielding a factor that on the surface looks reasonable, but may in fact not be. For example if one element is high and the other one it is combined with low, then the result will be somewhere in between. Think of two out of phase sine waves. Each one by itself looks like a sine wave, but the combination is something else entirely.

So as an example, suppose you are collecting defect data (requirements defects, input defects, design defects, coding defects, test case defects, etc.) and you have neglected to use these categories to count and distinguish the different types of defects. Then the defect count probably won’t correlate with anything because the individual categories are cancelling each other out. This is the kind of data behavior that caused many of the early HM adopters to throw away their historical data and start over from scratch. So it is very important to follow the Measurement and Analysis PA to properly define all of the base measures and to properly store them so when it comes time to perform the statistical analysis of the data, there is sufficient information stored to allow different types of analyses to be performed to isolate the specific information of interest.

Hope this explanation helps.

No comments: