In the problem of communicating information from point A to point B, we are faced with the task of dealing with a noisy channel that sits between A and B and corrupts or distorts the signal fed into it. A and B can be points in time intend of points in space - for e.g. recording data on some medium, such as a compact disk. Our goal is to take information produced by a source, and map it to a signal that can be carried by our intended channel, in such a way that when this signal is fed into the channel at point A, the distorted version produced by the channel at point B can be used by us to faithfully recover the original information with little or no change in meaning of the original information. In solving this problem, we first try to understand the nature of distortion that the channel creates, and based on this knowledge, we design an appropriate mapping between the original information emitted by the information source, and the signal that is fed into the channel. This process is called encoding. Using the knowledge of the type of distortion produced by the channel, we can then use the distorted signal at B and map it to our guess of the original information emitted by the source. This process is called decoding. The study of how to create efficient encoding and decoding strategies for specific types of channels is called coding theory.In machine learning, we are placed at point B and at point A is a source of information that is not under our control. There is no entity sitting at point A creating a mapping from the original information to a signal appropriate for the channel in order to better withstand the distortions the channel will produce. The channel between A and B is 'nature'. We are faced without the task of figuring out the original information at the information source by observing the signal emitted by the channel at B. To do this, we attempt to learn a function that can correctly map the signal output by the channel at B to the original information at A. In terms of the communication problem, we are attempting to learn the decoding rule in the communication problem above, even though we do not have prior knowledge about the nature of distortion produced by the channel. In supervised learning, we are given several input-output pairs of the channel, and we attempt to learn this function that correctly maps the output to the input. Once we have estimated this function, we apply it to estimate the original information for instances where only the output of the channel is known.