Blinkdagger is proud to have Rob Slazas, a R&D engineer and 6-sigma blackbelt, to head up a new series on statistics in MATLAB. Click here to see a full list of posts from Rob.

Matlab Logo

MATLAB and the Statistics Toolbox have a nice complement of functions useful in statistical analyses. For our first look into this area, let's get a mile-high view of the different function categories and how they are used.

Like all things MATLAB, they benefit from the power of the language being infinitely customizable, and the genius of the file exchange to borrow from those who have solved before you. Some folks say the stat functions' output are not as polished as those from a dedicated, commercial stat package. But in our survey below, I think you'll find nearly everything you need is either included, available at the FeX, or a short bit of code away.

A Little Background

Just to make sure we're on the same page here, what are 'Statistics' anyway? Philosophies vary, but at the heart of it stats are a collection of mathematical procedures (tools) meant to help us make decisions or answer a question using imperfect or incomplete data.

Statistics are also (in)famous for being misused, since they are susceptible to the selection bias of the user. The most common problems are applying statistical tools to a question that they really aren't capable of answering, or analyzing only the favorable data. When you run a procedure on data, it will still return an "answer" whether it is appropriate or not.

I like the way Tukey (a great 20th century mathematician) said it:

   "Far better an approximate answer to the right question, which is often vague,
 than an exact answer to the wrong question, which can always be made precise."

Types of Stat Functions

All of the main categories of stat functions are represented in the Statistics Toolbox. In the MATLAB help you'll find a great bit of detail on their usage. You may also find that some of these functions are more "mainstream" than others. I would say - if you haven't heard of a particular function before, caveat emptor! Crack a book first and make sure you will get what you intend from using it.

Now some detail about the contents of the Statistics Toolbox…

Descriptive Statistics

Descriptive statistics tell you particular things about the data that you have on hand, and nothing more. Common descriptive statistics are the mean, median, mode, min value, max value, range, and std (standard deviation).

Inferential Statistics

Inferential statistics attempt to take your data a step further and draw some conclusion(s). Most functions add your data to a set of assumptions, and return a result that does more than just describe the dataset. This strategy can be very economic and powerful, since it allows you to employ data sampling instead of gathering very large amounts of data. Of course the danger is that the assumptions of the function are not appropriate for your data, in which case you could be mislead.

One of the most common assumptions is that data comes from a "normally distributed" population. Since the characteristics of the Normal distribution are well understood, only a small sampling of data is needed to make inferences about the entire population of data from which the sample came. Not all functions make this assumption, but those that do benefit from leveraging a huge amount of previous work.

The categories of inferential statistics functions in MATLAB are:

  • Hypothesis Testing
  • Regression
  • Multivariate Methods
  • Cluster Analysis
  • Probability Distributions
  • Markov Models
  • Statistical Process Control (SPC)
  • Analysis of Variance (ANOVA)
  • Design of Experiments (DOE)

Statistical Visualization

Like most things, visualization (pretty graphs) is the "cool" part of the toolbox. Some things just seem to make sense immediately when graphed, so these make excellent communication tools. As you might expect, MATLAB does not disappoint in the visualization department. With no less than 46 functions dedicated to statistical visualization, and the ability to programmatically change the output, we are well taken care of here.

Worth mentioning are some of my favorites, with a selection displayed below with dummy data: histfit (draws a histogram of your data with a normal curve fit overlaid), boxplot, wblplot (Weibull distribution probability plot), randtool (an interactive random number generator), and polytool (an interactive polynomial fitting tool).

fudge = repmat(1:4,30,1); dat = randn(30,4).*fudge+3*sin(fudge);
figure('Position',[100 100 600 300]);subplot(1,2,1);histfit(dat(:));
subplot(1,2,2);boxplot(dat,'notch','on');


The "Other" Functions

Bringing up the tail end of the toolbox is a collection of helpful things for handling data and making GUI's. They fall into the following categories of function:

  • File I/O
  • Organizing Data
  • Classification
  • Graphical User Interfaces
  • Utility Functions

These can be particularly useful for aggregating large amounts of data, or for datasets with relationships that are important to preserve.

Wrapping Up

If you are new to statistics, or just new to statistics in MATLAB, hopefully this overview helps you dig in a little more. Those of you who want to see specific applications or functions covered in a future blog post, please reply below with your suggestions.