Deep learning thrives on problems where automatic feature learning is crucial and lots of data are available. Large amounts of data are needed to optimize the large number of free parameters in a deep network. In many domains, that would benefit from automatic feature learning, large amounts of data are not available, or not available initially. This means that models with low capacity must be used with hand crafted features, and if large amounts of data become available at a later time, it may be worthwhile to learn a deep network on the problem. If deep learning really reflects the way the human brain works, then I imagine that the network being trained by a baby must not be very deep, and must have few parameters than networks that are trained later in life, as more data becomes available. This is reminiscent of Bayesian Nonparametric models, wherein the model capacity is allowed to grow with increasing amounts of data.