Statistic and Machine Learning

Chinh Ho
3 min readSep 6, 2021

There is considerable overlap among these, but some distinctions can be made. Necessary, I will have to over-simplify some things or give short steps to others, but I will do my best to make sense of these areas.

Firstly, Artificial Intelligence is quite different from the rest. AI is the study of how to create intelligent agents. In effect, it is a way of programming a computer to behave and perform a task as an intelligent agent (say, a person) would do. This has nothing to do with learning or touch at all. It could just be a way to ‘build a better mousetrap’. For example, AI applications have included programs to monitor and control ongoing processes (e.g., increase aspect A if it seems too low). Note that AI can include almost anything a machine does, as long as it does not do it ‘stupidly.’

In practice, however, most tasks that require intelligence require the ability to generate new knowledge from experience. Thus, a broad field in AI is machine learning. A computer program is said to learn some task from experience if its performance improves empirically by some performance measure. Machine learning involves studying algorithms that can extract information automatically (i.e., without online human instruction). It is undoubtedly the case that some of these processes include ideas derived directly from or inspired by classical statistics, but they do not. Like AI, machine learning is extensive and can include almost anything, as long as there is some inductive component. An example of a machine learning algorithm might be the Kalman filter.
Data mining is an area that draws its inspiration and techniques from machine learning (as well as from statistics) but is put for different purposes. Data mining is performed by a person, in a particular situation, on a particular data set, with a goal in mind. Usually, this person wants to leverage the power of various pattern recognition techniques that have been developed in machine learning. Often, datasets are enormous, complex, and may have unique problems (such as having more variables than observations). Usually, the goal is to discover/generate some preliminary insights in an area where there is little prior knowledge or to predict future observations accurately.
Furthermore, data mining processes can be ‘unsupervised’ (we do not know the answer — discovery) or ‘supervised’ (we know the answer — predict). Note that the goal is not generally to develop a more complex understanding of the underlying data generation process. Standard data mining techniques would include cluster analysis, classification and regression trees, and neural networks.
I suppose I don’t need to say much to explain what statistics are on this page, but maybe I can say a few things. Classical statistics (here, I mean both regulars and Bayesian) is a sub-topic in mathematics. I think of it mainly as the intersection of what we know about probability and what we know about optimization. Although mathematical statistics can be studied simply as an object of Platonic study, it is mainly more practical and applied in personality than in other, rarer fields. Of mathematics. As such (and notably in contrast to the data mining above), it is mainly used to gain insight into some particular data generation processes. So it usually starts with a formally specified model, and from here are derived procedures to extract exactly that model from noisy cases (i.e. estimate — by optimizing some loss function) and to be able to distinguish it from other possibilities (i.e. inference based on known characteristics of the sampling distribution). The prototypical statistical technique is regression.

--

--