I want to add several caveats to this discussion of the statsmodels results output.

**Caveat 1**. Statsmodels has, frankly, terrible documentation. And a number of the tests in the third row could be applied to a number of different things (i.e., to the distribution of the residuals from our model, as opposed to, say, the dependent variable or something), and the document doesn't specifically say what statsmodels applies them to to generate the summary output. I'm going to *assume* that they are being applied to the things they're supposed to be applied to, because the people who wrote statsmodels aren't stupid. This is almost certainly a safe assumption, but I really don't like making any assumptions, so, in a critical context, I'd probably want to look up the statsmodels functions for a particular test and apply them directly to the residuals myself.

Honestly, though, for regressions I'd probably just use a different language---R has much more fleshed-out documentation as well as the advantage of being used by a truly large number of stats people, so that the toolkit for basic models like regression has been combed over by countless experts. If I were teaching a full-scale regression analysis class (as opposed to just touching on it in the context of lots of other things where a real general-purpose programming language is more useful than a specialized stats language) I'd teach it in R. And if you find yourself doing regressions seriously, go learn R and do them in there.

**Caveat 2**. No single diagnostic number or even collection of diagnostic numbers will tell you whether a regression is usefully specified or not. These are cues to further investigate, and to adjust (like a good Bayesian) your ultimate belief in the propositions that the model is supposed to support.

**Caveat 3**. We're not digging into this in a lot of detail here. And we probably won't have time to discuss this stuff in class. For practical lawyering purposes, the main point of seeing this kind of material is just to have an extra couple of questions to ask, or possibly triggers of things to worry about, for example, when you see an expert report.

With those caveats in mind:

The *omnibus test* is a hypothesis test with the null hypothesis that a distribution is normal. Assuming statsmodels is sensible here, this hypothesis test is being applied to the residuals, so the important thing is the `prob(omnibus)`

entry, and a low p-value suggests that we should probably worry about having non-normal residuals. The *Jarque-Bera test* is another test for normality.

The *Durbin-Watson statistic* is a measure of autocorrelation. As I said, this is mostly important in time series data, which comes up most commonly in the context of financial modeling---I understand from the folks who do this work that the range of this statistic is 0-4, and that the closer to 2, the less likely that there's a worry about autocorrelation; generally 1.5-2.5 is ok.

*Skew and Kurtosis* are measures of how far off a distribution is from normal---skew is about whether it's not symmetrical (like more data on the left or right), kurtosis is about how spiky it is. A normal distribution should have skew around zero; and a standard normal will have kurtosis of 3. Here's a good read from NIST on skew and kurtosis. You'll hear different things about acceptable ranges of each.

We've already talked about the condition number---and about how the example data is just full of All the Multicollinearity.

These aren't the only regression diagnostic tests statsmodels offers---check out their documentation for more. But this will do as a summary for now.