Understanding Happiness

Happiness is as hard to define as it is to achieve. Everybody wants to be happy. Even masochists. I think it is best to use  a non-constructive definition:

Happiness is the goal that drives all human actions and desires.

If long term happiness if everybody's ultimate goal, then it is worth learning how to achieve long term happiness. In fact, if being happy is the ultimate goal (as opposed to say, being wealthy), then our education system should also be teaching us how to be happy over a life time, rather than purely technical or vocational skills. Simple GDP growth does not imply an increase in the happiness of a society -- as indicated by data from the last ~40 years in the US, comparing per capita GDP and happiness levels:

Screen Shot 2012-12-17 at 3.09.43 PM

While per capita GDP has risen more or less steadily, happiness levels have remained more or less stagnant in the last ~40 years.

Should countries develop public policy with the goal of making a society happier, rather than with the goal of increasing GDP? I think it is an idea worth exploring (Scandinavian countries seem to rank highest in in the world in happiness scores, despite high taxes). The government of Bhutan came up with the Gross National Happiness index, which measures the average life satisfaction of the citizens in a country.


This correlates well with health, access to education, and wealth (GDP per capita). At any given time, the relationship between average happiness of a country and per capita GDP seems to log-linear, meaning that happiness is roughly linear in the log of the per capita GDP.

Screen Shot 2012-12-15 at 6.07.29 PM

This is because in order to increase the happiness level of a society by 1 unit, the increase in wealth required is proportional to the current welath. For e.g., if the required amount of increase in personal wealth for a group with per capita income of $1000 is $x, then it is $10x for a group with per capita income of $10,000.

Near the end of this talk, Daniel Kahneman says that in a study done with the Gallup organization, he found that:

Below an income of … $60,000 a year, people are unhappy, and they get progressively unhappier the poorer they get. Above that, we get an absolutely flat line. … Money does not buy you experiential happiness, but lack of money certainly buys you misery.

Kahneman distinguishes between two types of happiness: that of the experiencing self and that of the reflecting self. It is possible to be happy in the experiencing self but have a poor happiness score when reflecting on a long time frame in the past, and vice-versa. For the type of happiness that measure life satisfaction in retrospect, there is no flat time -- i.e. it continues to increase with increasing wealth. I don't find this too surprising. It is the difference between short term and long term happiness. It is easy to be happy in the short term at the expense of the long term. On the other hand, tolerating displeasure during hard work in the present can have a huge payoff in long term happiness in the future.

In this TED talk, Dan Gilbert showcases his research that shows that happiness can be synthesized by individuals. So happiness is not some finite resource that needs to be distributed among people, instead one can simply choose to be happy, despite seemingly adverse conditions. This is fascinating, because it provides experimental evidence that happiness has to do not just with our external circumstances (such as GDP per capita), but also with how we process information in our minds. Several religions have the concept of expressing gratitude. The act of being grateful basically synthesizes happiness out of thin air.

Defining Intelligence

Intelligence is a slippery thing to define. The following definition recently struck me:

That which allows one to maximize happiness over long term.

I like this definition because it is short (c.f. MDL, Occam's Razor), it makes logical sense, and it carries a lot of meaning without going into details of how to be intelligent. It is logical to me because of the following argument: Suppose a person is allowed to life two versions of his life starting from some fixed point in his life. All events and circumstances in the two versions are the same except for actions taken by the person. Then he can be said to be more intelligent in that version of his life in which he achieves greater happiness over the rest of his life.

Intelligence is needed in order to understand what actions will make us happy, for how long, and whether there will be any effects of those actions on our future happiness. Making decisions to maximize cumulative happiness is certainly a non-trivial task. Sometimes one must put oneself through short-term adversity (e.g. graduate school at little on no stipend, or an athlete undergoing gruelling training for a race) to be able to do well later. Sometimes, one decides to undertake an action that provides short term happiness, but at the cost of long term happiness. It takes intelligence to learn to avoid such behaviour n the future.

Modern definitions of intelligence from the scientific and psychology community are incredibly long-winded [Wikipedia]

A very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. It is not merely book learning, a narrow academic skill, or test-taking smarts. Rather, it reflects a broader and deeper capability for comprehending our surroundings—"catching on," "making sense" of things, or "figuring out" what to do.


Individuals differ from one another in their ability to understand complex ideas, to adapt effectively to the environment, to learn from experience, to engage in various forms of reasoning, to overcome obstacles by taking thought. Although these individual differences can be substantial, they are never entirely consistent: a given person's intellectual performance will vary on different occasions, in different domains, as judged by different criteria. Concepts of "intelligence" are attempts to clarify and organize this complex set of phenomena. Although considerable clarity has been achieved in some areas, no such conceptualization has yet answered all the important questions, and none commands universal assent. Indeed, when two dozen prominent theorists were recently asked to define intelligence, they gave two dozen, somewhat different, definitions.

The same Wikipedia page also lists various different definitions given by researchers. The long-windedness of these definitions is somewhat excusable as an attempt to be all-inclusive and general. But in the end, the notion of intelligence is a man-made model, invented to try and explain phenomena. I think a focus on happiness as the central phenomenon to be explained goes a long way in simplifying our understanding of intelligence.

Average age of first-time Mothers

The average age of first time mothers in the developed countries of the world has been rising for the last ~40 years.

Here is another plot that shows the rate of occurrence of Down Syndrome, a chromosomal defect, as a function of the age of the mother at the time of child birth.

The curve really starts to shoot up at 30. In the UK, the average age of a first time mother is 30 years. It is well known that the fertility rate in women decreases after the age of 30 and drops rapidly after 35. Older mothers are likely to find it harder to have a baby and if they do, then they run a higher risk of chromosomal defects. Given the possibilities of all these negative consequences, the increase in the average age is a bit disturbing. It seems like there is a hidden cost to more women working and for longer.

Why is it that women are waiting longer before having their first born despite the risks? Most of my hypotheses (of which more than one, or none, may be true) have to do with women working:

  • Having invested significantly into an education, greater number of women are entering the workforce, with the desire to be financially independent.
  • There is greater financial pressure in families for women to work.
  • The policies of workplaces in these countries are not favourable to childbirth. I can see this is true in the US, but I doubt this holds for Wester European countries, which I know have policies favorable to child birth.

One source of further information is the following map, showing the absolute increase, in years, in the age of a first time mother, over the last 40 years, state by state in the US:

This number is highest in the North Eastern states of NY, NJ, MA, CT, etc. The intensity of the colors in the map above correlates well with population density, and with economic activity in general (meaning more working women).  Here are two more plots I came across in a US based study done by P&G, that suggest that at least in the US, employer policies may be responsible.

What would a mother value most from their employers?:

How much guilt to mothers feel about work-life balance?:

What is the biggest problem in the world?

I have been posing this question to friends and acquaintances (and to myself) in one form or another for a while now. The answers I have received have varied significantly. I am not the first to pose this question of course. Here is one of several online polls, posing the same question, with 700+ responses so far. Here are some others. Some of the responses I have received personally and gathered from various online postings like the ones above, in no particular order include:

Environmental change Poverty, Hunger, clean drinking water
The P vs NP problem Ego
War Communication between people
Lack of tolerance Jobs, economy
Ignorance Fear
Greed Lack of genuine love, hatred
Religion Racism
Moral decline Energy shortage
Sin Drugs
Terrorism Apathy, lack of empathy
Anger Pollution
Love for Money Politics
Forgetfulness of God Overpopulation, limited world resources
Toxic waste Consumerism
Death Selfishness
HIV All –isms: Nationalism, sexism, racism..
Cancer Envy

One of the problems with the way the question is posed above is that it does not specify what 'problem' and 'biggest' mean.

Define problem. We will define problem as 'That which brings suffering to humans'.

Define biggest.  Biggest could mean 'one that affects the largest number of people', 'the scientific problem that would create the biggest impact if solved', or 'one with the greatest economic impact', etc. I am interested in a specific version of this question, in which 'biggest' means 'most fundamental', i.e. one which can be said to be a root cause of many other problems.

Causal structure. A natural question to pose in order to move in the direction of getting an answer to my version of 'biggest problem' is: how many degrees of freedom are really present in the above responses (and what are they)? That is, are they all independent problems, or do they stem from a relatively small set (1-2) of root causes (with others being effects)? For example, lack of tolerance and energy shortage can be said to be causes of war.  It is also clear that not all the problems listed above are at the same level of generality -- some seem intuitively more abstract or fundamental than others. For e.g., war seems more in the realm of effects or symptoms, compared to say anger, fear or greed. In other words, even though they are all problems, some of the items in the list above are really effects rather than causes, and I am interested in the causes. To restate the question properly:

What is the true causal structure of the world problems?

Here is a small toy example of what I mean by causal structure:

An arrow from A to B indicates 'A causes B'. In the above example, energy shortage is stated to be a cause for war, and lack of tolerance is also stated as a cause for war. Also, once energy shortage is taken into account as a cause for war, then war is not caused by overpopulation or consumerism. In other words, overpopulation and consumerism do lead to war, but only through energy shortage.

One correct answer. What strikes me most about the restated question above is that there must exist a definite answer to it. That is, there is an objective reality associated with the question. The causal structure is not a matter of subjective opinion. There is one true structure of cause and effect. I am not claiming the number of independent root causes at the very top of the causal structure is 1 (perhaps this is the case). All I am saying there is one definite causal structure. The 'one correct answer' aspect is interesting because while it is arduous to build a causal structure, checking whether a proposed structure makes sense should be much easier.

I am looking for this causal structure. I think that gaining an understanding of the causal structure can be more insightful than an understanding of the each of the problems in isolation [1]. If you think you have have a causal structure of even part of the list of problems above, please write to me or leave me a comment. If you contact me with a proposed causal structure, please use the following format:

Cause1 -> Effect1 Cause2 -> Effect2

and so on, with one cause-effect pair per line. For the above toy example, this would be:

Overpopulation -> Energy shortage Consumerism -> Energy shortage Energy shortage -> War Lack of tolerance -> War

Think of this as a jigsaw puzzle, in which the problems are the blocks (feel free to pick whatever set of problems you want from the above list, or otherwise. Of course, the more complete the set, the better.), and one has access to as many arrows as needed (The fewer the arrows, the better).



[1] I think this may be true in general. In middle school I recall homework and exam questions in various subjects asking us to fill in the blanks or match entries in column A with the entries in column B. I feel explaining the causal structure between a set of things would make a very instructive exercise in school because it would force a student to think.

In the eye of the Beholder

Why is the picture on the right more appealing than the one on the left?

What is it that we find more interesting about the picture on the right, compared to the one on the left? The picture on the left contains more information. So we are certainly not looking for more information. One might say we don't know how to interpret the image on the left into anything familiar, but it is television static. A more precise answer is given by Jurgen Schmidhuber, who argues convincingly that:

Artists (and observers of art) get rewarded for making (and observing) novel patterns: data that is neither arbitrary (like incompressible random white noise) nor regular in an already known way, but regular in way that is new with respect to the observer's current knowledge, yet learnable (that is, after learning fewer bits are needed to encode the data).

This explains the pictures on top. The picture on the left is not compressible because it is a matrix of uniformly random 0/1 pixels. The Monet on the right evokes familiar feelings, and yet adds something new. I think what Schmidhuber is saying is that the amount of compressibility should neither be too little, nor too much. If  something is not very compressible, then it is too unfamiliar. If something is too compressible, then it is basically boring. In other words, the pleasure derived first increases and then decreases with the compressibility, not unlike this binary entropy curve.

Let us ask the same question again for the following pair of images (you have to pick one over the other):

My guess is that most people will find the image on the right more appealing (it is for me at least). Please drop me a comment with a reason if you differ. When I look at the image on the right, it feels a little more familiar, there are some experiences in my mind that I can relate to the image - for example looking straight up at the sky through a canopy of trees (white = sky, black=tree leaves), or a splatter of semisolid food in the kitchen.

In order for an object to be appealing, the beholder must have some side information, or familiarity with the object beforehand. I learnt this lesson the hard way. About 2 years ago, I gave a talk at a premier research institution in the New York area. Even though I had received complements when I'd given this talk at other venues, to my surprise, this time, audience almost slept through my talk. I learnt later that I had made the following mistake: in the abstract I'd sent to the talk's organizer, I had failed to signal that my work would likely appeal to an audience of information theorists and signal processing researchers. My audience had ended up being a bunch of systems researchers. The reason they dozed through my talk was that they had just a bit less than the required background to connect the dots I was showing them.

It is the same with cultural side information or context -- the familiar portion of the object allows the observer to latch on. The extra portion is the fun. Without the familiar, there is nothing to latch on to. The following phrases suddenly take on a precise quantifiable meaning:

  • "Beauty lies in the eyes of the beholder": the beholder carries a codebook that allows her to compress the object she is observing. Each beholder has a different codebook, and this explains 'subjective taste'.
  • "Ahead of its time": Something is ahead of its time if it is very good but does not have enough of a familiar portion to it, to be appreciated by the majority of observers.

I can think of lots of examples of art forms that deliberately incorporate partial familiarity into them -- e.g. music remixes, Bollywood story lines. Even classical music first establishes a base pattern and then builds on top of it. In this TED talk, Kirby Ferguson argues that all successful creative activity is a type of remix, meaning that it builds upon something familiar.


  1. When writing a paper or giving a talk, always make sure the audience has something familiar to latch on to first. Otherwise, even a breakthrough result will appear uninteresting
  2. Ditto for telling a story or a joke, DJing music at a party, or building a consumer facing startup. Need something familiar to latch on to.
  3. In some situations it may be possible to gauge the codebook of the audience (e.g. having a dinner party conversation with a person you just met), to make sure you seem neither too familiar, nor too obscure.

Climate change and The Burglar's Stopping Problem

The climate change hypothesis is that global changes in climate leading to significantly higher number of severe weather events are predominantly man-made, and in particular, the release of greenhouse gases such as carbon-dioxide into the atmosphere is a leading cause. After conveniently escaping the national spotlight in the US during the presidential campaigns, climate change has once again appeared in the news, thanks to Hurricane Sandy. Munich-Re, the reinsurance giant released a report, somewhat presciently on Oct 17, that says:

Nowhere in the world is the rising number of natural catastrophes more evident than in North America. The study shows a nearly quintupled number of weather-related loss events in North America for the past three decades, compared with an increase factor of 4 in Asia, 2.5 in Africa, 2 in Europe and 1.5 in South America.

Unambiguously proving that man-made climate change has a role to play in a specific event such as Sandy is more of an ideological debate, than a statistical exercise. And so there are many many people who can boldly claim that man-made climate change is fiction.

A few industrialized nations are responsible for the bulk of CO2 emissions.

Some of these nations have refused to ratify the Kyoto Protocol, which calls for reduction in CO2 emissions. No points for guessing which colors in the map below denote countries that have not ratified the treaty.

(Brown = No intention to ratify. Red = Countries which have withdrawn from the Protocol. Source: Wikipedia)

Most of the world apparently believes in man-made climate change. When will these other countries wake up? I can't help but think of the following stopping problem:

Taken from G. Haggstrom (1966) “Optimal stopping and experimental design”, Ann. Math. Statist. 37, 7-29.

A burglar contemplates a series of burglaries. He may accumulate his larcenous earnings as long as he is not caught, but if he is caught during a burglary, he loses everything including his initial fortune, if any, and he is forced to retire. He wants to retire before he is caught. Assume that returns for each burglary are i.i.d. and independent of the event that he is caught, which is, on each trial, equally probable. He wants to retire with a maximum expected fortune. When should the burglar stop?

Traffic Lights: Regulation vs. free-markets

In the aftermath of the Sandy Hurricane, many parts of the NY/NJ area have sustained power outages, and as a result, traffic lights in these areas are not functional. This requires drivers to approach a traffic junction as a multi-way stop-sign. This got me thinking: What if, in place of traffic lights, we had just stop signs everywhere, and the rule was: the next car to go should be the car at the head of the longest queue. I believe this is an optimal scheduling policy in a certain sense (it provides an optimal throughput x delay product -- that is for a given average delay at the intersection, it would provide the highest rate number of cars going through [1] ). In this policy, each driver is trusted to follow the scheduling policy faithfully. For argument sake, I am ignoring (1) the time spent by each driver having to figure out which queue is the longest at each step, (2) how the driver at the head of each queue gets information about the length of each queue, and (3) the loss in efficiency incurred by slowing down and starting. Compared to this self-enforced scheduling policy, traffic lights can be very suboptimal. You know this if you have ever stood on a red light waiting to turn green while the street with the green signal has no traffic. Why then do we have traffic lights? The problem is that in the self-enforcing scheduling policy, there will be some drivers who will free-load, i.e. they will not obey the rule and simply take the turn for themselves, even if the turn belongs to someone else according to the scheduling rule. Further, when this happens, it will often result in collisions between the free loader and the rightful owner of the turn. This is why traffic lights are necessary, even though they come at the expense of reduced overall efficiency.

There is a nice lesson embedded here that speaks to the need for government regulation by way of analogy: Regulation is necessary to enforce fairness and safety by preventing freeloaders and accidents, even though a free market might provide higher overall benefit if everyone was guaranteed to behave properly. Therefore regulation is the price we must pay, in the form of reduced overall benefit, to counter the fact that all market participants do not behave as per the rules if left to themselves.

EDIT 1: The loss in overall utility when all participants are allowed to act selfishly, compared to the state where each participant acts for the overall good of the set of all participants, is called the price-of-anarchy. This is different from (but related to) the loss in overall utility from the imposition of regulations. A simple 2-player prisoner's dilemma can exhibit the price of anarchy when all participants are worse off if allowed at act selfishly, compared to the overall optimal for the 2 players. In the traffic light example, when players act selfishly, they create unfairness and also end up endangering everyone (including themselves, but perhaps they don't realize this bit). Hence the utility derived by each participant is lower, compared to if they all cooperated perfectly.

EDIT 2: Regulation can be thought of simply as a mechanism designed to improve the utility received by players beyond what it would be in anarchy, by changing the (rules of the) game a little. Regulation typically doesn't take the system to the overall optimal (which corresponds to perfectly cooperating players in the original game) of the original game. The 'price of regulation' ( = utility of overall optimum - that achieved by regulation) should be less than the price of anarchy (= overall optimum - state achieved by anarchy). Modern day regulators need to be really good at mechanism design!

EDIT 3: Perfect cooperation can be unstable against defection by free loaders [2] because the utility a player derives by unilaterally defecting is greater than that obtained by cooperating. If everyone is well aware of the risk of an accident upon defecting, then this can serve as a disincentive to defecting because the utility from defecting, after factoring in the probability of an accident may no longer make defecting worthwhile. This suggests that simply increasing awareness of the risks posed by misbehavior upon the misbehaving player, might improve the overall equilibrium a bit. Of course, this requires that the defector bear extra personal risk.


[1] I know this because it holds true for scheduling packets transmissions in a class of communication networks [citation required].

[2] I experienced free loaders first hand during the last few days after Sandy in 2 different contexts: people going out of turn at road intersections, and people trying to break into the long line at a gas station.

Non-linearities in Life

Conventional wisdom says that the marginal gains accrued per unit increase in effort are diminishing. In some cases this does not hold, and the gains seem to be super-linear, rather than sublinear in the amount of effort. Consider for a moment that you are driving in your car, let's call it Car 1, down a straight long road that has a series of traffic lights. A second car, lets call it Car 2, is driving near you at a speed that is only very slightly greater than yours. The other car is inching ahead of your car very slowly. At time t, the distance by which Car 2 is ahead of you is d(t).

As you near the next traffic signal, the light turns amber and you begin to slow down. But the driver of the other car doesn't feel the need slow down, but instead makes it through the light without stopping, while you wait at the traffic signal. The gap between the two cars has suddenly increased. If the two cars were to maintain the same small different in speeds throughout the road, slowing and stopping only at traffic lights, then at time t, the distance d(t) between them will be a non-linear function, quite different from the linear function that it would have been, had there been no traffic lights. What do you think d(t) would look like?

It's not hard to reason out the shape of d(t). At any given time, either both cars are moving,, in which case, they have a relative velocity of v2-v1, or car 1 is stopped at a signal (relative velocity of v2), or car 2 is stopped at a signal (relative velocity of -v1). So the plot of d(t) versus t would be piece wise linear with one of 3 possible slopes in any segment. In the figure below, each segment is labelled with which car(s) is moving during that segment.

Over a long enough time t, d(t) will be significantly above the dotted line, i.e. the value of d(t) that one would expect if there were no traffic lights. Therefore, in the long term, Car 2, which started out with a small velocity advantage ends up far far ahead of Car 1, thanks to non-linearities introduced by the traffic lights.

I have often mused about the similarity between the scenario above and other aspects of life, such as career growth. Just like the traffic lights, there are a number of thresholds in real life, e.g cut-off marks for the A+ grade in exams, a threshold for being accepted at a prestigious university or a job interview, or the difference between getting and not getting an award. Two individuals that differ by only a small amount, will, over the course of a long time interval, find themselves on different sides of these thresholds enough number of times. Thus, small differences in skill level and/or determination between very similar workers get amplified, leading to drastic differences in their achievements over a lifetime. Natural processes around us, including interactions between humans, are highly non-linear, and my suspicion is that humans do not accurately perceve the extent of the non-linearities. The fact that long term benefit can be superlinear in effort is a significant realization, because it means that putting in just a little bit of extra effort -- e.g. an extra hour at work daily, an extra skill, extra effort in networking or maintaining relationships -- can have a disproportionately positive effect in the long term. I'll end with a passage from Richard Hamming's 1986 lecture at Bell Labs titled 'You and Your Research' that touches upon this:

.. Now for the matter of drive. You observe that most great scientists have tremendous drive. I worked for ten years with John Tukey at Bell Labs. He had tremendous drive. One day about three or four years after I joined, I discovered that John Tukey was slightly younger than I was. John was a genius and I clearly was not. Well I went storming into Bode's office and said, ``How can anybody my age know as much as John Tukey does?'' He leaned back in his chair, put his hands behind his head, grinned slightly, and said, ``You would be surprised Hamming, how much you would know if you worked as hard as he did that many years.'' I simply slunk out of the office!

What Bode was saying was this: "Knowledge and productivity are like compound interest.'' Given two people of approximately the same ability and one person who works ten percent more than the other, the latter will more than twice outproduce the former. The more you know, the more you learn; the more you learn, the more you can do; the more you can do, the more the opportunity - it is very much like compound interest. I don't want to give you a rate, but it is a very high rate. Given two people with exactly the same ability, the one person who manages day in and day out to get in one more hour of thinking will be tremendously more productive over a lifetime. I took Bode's remark to heart; I spent a good deal more of my time for some years trying to work a bit harder and I found, in fact, I could get more work done.

1 hour extra per day turns out to be equal to be an extra ~1.5 months of 8 hour work days per year!

Stopping Problems

Optimal stopping problems frequently pop up in real life decision making. I have been collecting a bunch of stopping problems, with the aim of clustering similar problems together, and finding general solution strategies to specific flavors of problems. Here is my (growing) list:

  1. Hiring, Marriage, Parking, Home selling: You are to hire an employee from a pool of N candidates. Each candidate has a fixed integer as its rank, but the rank of a candidate becomes known only after the interview. After each interview, you must decide whether to hire that candidate or to interview the next one. On moving to the next candidate, you loose the opportunity to hire the one you just interviewed. If you hire none of the first N-1 candidates, you must hire the N^th. When should you stop interviewing? Other versions of the same problem: Marriage: Replace candidates with potential mates. House selling: The goal is to sell a house at max profit. Parking: To look for a parking spot so as to park your car nearest to the entrance of a supermarket. You can either choose an empt spot or pass up on it in the hope of getting a better one. Other versions: (i) N is potentially infinite, but there is a cost proportional to the amount of time you take to decide. (ii) In the house selling problem, instead of ranks, one observes a real number as the score of a candidate. The goal is to stop at the candidate with as high a score as possible.
  2. Bug finding, proof-reading: Review of a piece of code or writing has a certain cost per review $C. The number of bugs is a random number with some known distribution. An undetected bug costs $B. Each time a review is done, each bug present has an iid chance of being detected and fixed, with some known probability p. When to stop reviewing?
  3. Gaussian process excursions: You are given a Gaussian process X(n), n = 0, 1, 2.. such that X(n) is Gaussian distributed with mean zero, and a known covariance function K(i,j) between X(i) and X(j). You are allowed to observer a maximum of N successive samples of X(t), and at each step decide whether to stop or continue. The reward upon stopping at $latex t_1 \leq N$ is $latex X(t_1)$. This problem generalizes problem #1, by allowing observations to be correlated rather than IID.
  4. A drunkard can walk either 1 step west or 1 step east at each time step, with equal probability. When must she stop, in order to maximize her distance from where she started? What if the number of steps is limited to N? Another version of the same problem: You have a shuffled deck of cards and you must draw cards from it, one at a time. A red card gives you $1 and a black card makes you $1 poorer. When should you stop picking cards in order to maximize profit?
  5. You are to toss an unbiased coin repeatedly. When you stop tossing, your reward is the fraction of tosses that came up heads. When do you stop tossing? What if you are allowed only N tosses?
  6. You are to throw an unbiased die a maximum of N times. When you stop throwing, your reward is the number on the upturned face of the die. How do you decide when to stop? This is a bit like the version of the Hiring problem in #1 above in which each candidate has a score rather than just a rank.
  7. Two players A and B play a game. Player A writes down two different integers chosen from the set {1, 2, ..., 100} on two separate pieces of paper and places them face downward on a table. Player B's task is to pick one of the pieces of paper, and either stop if she believe it is the larger of the two number, or pick the 2nd one if she believe the 2nd one is the larger one. What should be the strategy of player B? What should be the strategy of player A if her goal is to make player B loose?
  8. A discrete time stochastic process that produces iid random variables $latex X_1, X_2,... $ from a known distribution, changes at a certain time T to a different distribution. The task is to detect the change as quickly as posible after it has occurred, assuming the the cost of detection is proportional to the time since the change and that there is a fixed cost to a false alarm.
  9. [One armed Bandit] As a doctor, you have two possible treatments for a certain disease. Treatment A is known to work with probability p_A, and treatment B has an unknown probability of working, but with some prior distribution over the probability of working. Given N patients to be treated sequentially, how do you decide which treatment to use, if the goal is the maximize the number of patients cured. This is called a 1-armed bandit problem because of the following equivalent problem: A casino slot-machine has two lever. Pulling a lever provides the player with either $1 or $0. The lever on the left is known to have a probability of p_left of providing $1 in winnings, while the lever on the right has an unknown probability. How do you go about picking levers if you have N tries? If you have infinitely many tries? These are stopping problems because, if, by virtue of trials of treatment A, if ever becomes known that treatment B is better, then it is optimal to continue using treatment B thenceforth. The problem therefore boils down to one in which one need only find a time at which to switch from using treatment A to treatment B.

Finite number of steps

It turns out that stopping problems are non-trivial to solve when the number of tries is infinite (e.g Question #2 above). But when the number of tries is finite, then dynamic programming is often useful in arriving at an optimal strategy.

For a simple example of how DP can help find an optimal stopping strategy, let us take Question #3 with N=3. A finite horizon provides a way to reason about the problem at each step by working backwards. After having tossed the die 2 times, it makes sense to toss the die a 3rd time only if the expected outcome of the 3rd toss is better than the 2nd toss. Since tosses are IID, the expected outcome of the 3rd toss is 3.5 and therefore it makes sense to go into the 3rd toss only if the 2nd toss is a 1,2 or 3, but not if it is a 4,5, or 6.  Applying the same logic, working backwards one stage: after having thrown the die the first time, it makes sense to toss a 2nd time only if the expected reward from continuing the game is higher than the present reward. The expected reward from going into the 2nd throw can be computed by considering two disjoint cases: if the game stops with the 2nd throw and if the game continues into the 3rd throw. This gives: (3/6)(4+5+6)/2 + (3/6)(1+2+3+4+5+6)/2 = 4.25. Therefore the 1st throw is good enough to end the game with only if it is > 4.25, that is, if it is a 5 or a 6.

Problem #1 has been shown to have a relatively simple solution, which was pretty surprising when I first came across it. The solution, for reasonably large values of N, is: observe the first ~37% of the candidates without stopping, and then pick the first one that beats all the ones before it. The intuition is that observing some of the candidates without stopping provides information about the quality of candidates, which provides a baseline for making a decision. Stopping too soon is suboptimal because you do not have sufficient information to make an informed decision. Stopping too late is suboptimal because you do not have much choice left. The 37% solution is provable optimal and is guaranteed to find the best candidate 37% of the time in repeated trials. A proof is readily available by googling.

Habit Formation, Part 2

This post builds upon an earlier post on modeling habit formation. To follow this post, please read this post first. Go on, I'll wait. The urn model in that post is not really a complete description of habit formation because the limit distribution of the system (i.e., fraction of white balls after a very large number of draws) is completely determined by the initial configuration of the urn (i.e., number of white and black balls). In reality, habits are formed from repeated actions, and our actions are based on not just our inclinations but also the conscious choices that we make, which can sometimes be against our inclinations. Free will is missing from this model. There is no way to break a bad habit if one start out with one.

Therefore, let us alter the basic urn model slightly to capture choice: At each step, before picking a ball, we explicitly declare which color we are interested in for that pick: a white ball ( = follow the desirable habit) or a black ball ( = follow the undesirable habit). Suppose we choose to aim for a white ball. A ball is then sampled from the urn at random, and if a white ball does indeed come up, then:

  1. A payoff of $W is received (We wanted white and we got white, so we are happy to the extent of $W).
  2. The white ball is placed back into the urn, along with an extra white ball (Reinforcement).

But if we decide to pick white and a black ball comes up, then there is no payoff, and the black ball is placed back in the urn without any extra ball. However, if we chose black, a black ball is guaranteed to show up, we get a payoff of #B, where # is a different currency from $, and the black ball is placed back along with an extra black. The extra black ball makes picking a white ball harder when a white is desired.

Consider the implications of this model:

  1. Suppose there are very few white balls compared to black. A white ball would rarely show up. With the decision to pick a white, most of the time there will be no payoff and no reinforcement. But with the decision to pick a black ball, there is guaranteed payoff of #B units, but at the cost of picking a white later on harder.
  2. As the bad habit is reinforced, the chance of reverting to the good habit never dies away completely, but diminishes as the number of black balls grows in comparison to the whites. ('It's never too late, but it does get harder').
  3. The purpose of the different currencies is to model the fact that the pleasure obtained from following a bad habit is qualitatively different from the one obtained from following a good one. Since most habits are based on short-term pleasure at the expense of long term benefit (e.g. cigarette smoking, binge eating /drinking, putting off exercise),  currency # may correspond to lots of short term kicks, while $ may correspond to long term well being.

Short-term thrills

I believe human behavior is fundamentally driven by a desire to be happy. However, different individuals can have very different ideas about what will make them happy. Since most bad habits result from a focus on short term happiness, at the expense of long term happiness, we would do well to make the following tweak: # and $ can be added to give a total payoff, but each unit of # currency is worth only $latex \epsilon < 1$ unit, 1 iteration after it was earned. The other currency, $, however, does not decay with time. The question is what behavior emerges when the objective is simply maximization of total payoff? It is intuitive that the answer will depend upon whether the game has a finite horizon or an infinite horizon. Since players have finite lifetimes, it is appropriate to use a finite time horizon, but since a person does not know when he/she will die, the horizon must be a life expectancy with some uncertainty (variance). In other words, nobody knows when they will die, but everybody knows they will die. The only time it makes logical sense to pick a black ball is if the player believes his remaining lifetime is so small that the expected cumulative gain from deciding to pick whites is smaller than the expected cumulative gain (including the decay effect) from picking blacks. Of course this depends upon how he spent his past -- i.e. whether or not he built up a high fraction of whites.

Breaking a bad habit

Suppose a person starts out with a predominant inclination towards an undesirable habit and wishes to switch to the good habit. It is clear that if she always chooses the white ball, then she will eventually succeed in creating a larger number of white balls and a good $ balance. But she will have to go many tries and no payoff before that happens. In practice, this requires determination, and she may be tempted to pick the black ball because it is so easily available.

Suppose $latex D \in [0,1]$ is the fraction of times that (s)he will decide to aim for a white ball, i.e. the player's determination. It is intuitive that a larger value  $latex D$ would help her switch to the good habit in fewer iterations. It would be interesting to see if there is a threshold value of $latex D$ below which the bad habit is never broken. In particular, I expect there to be a threshold value $latex D(w,N-w)$, below which the bad hait is never broken, w.p. 1, where $latex w$ is the number of whites in the urn and $latex N-w$ is the number of blacks.

Game over when wealth < 1 unit

Now let us further suppose, that in the course of playing the game, a player can never have less than 1 currency units. In other words, the game stops if the total currency units with a player reaches zero. All other rules are the same: 1 unit earned from a white ball, 1 unit from a black ball but units earned from black balls decay to $latex \epsilon$ of their value every time slot. Deciding to pick a black guarantees a black, and an extra black goes in. Deciding to pick a white, provides a white with a probability proportional to the fraction of whites in the urn, and an extra white goes in. With the new rule stipulating 'game over' when the player's wealth falls below 1, the player is incentivized to pick a black if she ever approaches 1 unit of wealth. This is meant to model the inclination of individuals with low self worth and happiness levels to engage in activity that would yield short term happiness (alcohol, drugs, binge eating, etc. ) at the expense of long term prospects.

Prime Factorization Trees

A positive integer can be visualized as a tree, structured by its prime factors. I wrote a little program which takes a positive integer as input, and creates a tree layout that shows its prime factors visually. Here's the tree for 70:

Screen shot 2012-10-07 at 9.55.37 AM

Screen shot 2012-10-07 at 9.55.37 AM

The tree is drawn recursively, starting with a root node in the center of a circular layout. The degree of the root node is the largest prime factor. Then, more prime factors are added recursively (including multiplicities of the same prime factor) in decreasing order of factor, by growing the tree. Finally, the leaf nodes are displayed as filled black circles. As seen from the tree above, the prime factorization of 70 is 7*5*2.

Here is the layout for 700 = 7 * 5 * 5 * 2 * 2:

Screen shot 2012-10-07 at 11.17.22 AM

Screen shot 2012-10-07 at 11.17.22 AM

This method of displaying numbers has been proposed and implemented by others on the internet, with much prettier results. I was too tempted to quickly try it for myself using a graph visualization library, because the graph drawing library can do all the heavy lifting. I used Graphviz (developed at AT&T Labs) which I use often for visualizing network data. I also wanted to see if the underlying tree structure revealed by the edges in the graph makes the prime factorization more evident.

Here are a few more trees:

243 = 3^5:

Screen shot 2012-10-07 at 11.44.01 AM

Screen shot 2012-10-07 at 11.44.01 AM

43 (a prime):

Screen shot 2012-10-07 at 10.04.39 AM

Screen shot 2012-10-07 at 10.04.39 AM

128 = 2^7:

Screen shot 2012-10-07 at 11.04.56 AM

Screen shot 2012-10-07 at 11.04.56 AM

121 = 11 * 11:

Screen shot 2012-10-07 at 10.55.17 AM

Screen shot 2012-10-07 at 10.55.17 AM

625 = 5^4:

Screen shot 2012-10-07 at 11.42.04 AM

Screen shot 2012-10-07 at 11.42.04 AM

2310 = 11 * 7 * 5 * 3 * 2:

Screen shot 2012-10-07 at 11.09.27 AM

Screen shot 2012-10-07 at 11.09.27 AM

Code gist (Without prime factorization implementation and graph library):

[sourcecode language="python"] def createnode(x): # x == 1: support node # x == 0: real node global nodenames name = random.random() while name in nodenames: name = random.random() nodenames.add(name) if x == 1: retval = '"'+str(name)+'"'+whitenode else: retval = '"'+str(name)+'"'+blacknode print retval return '"'+str(name)+'"'

def createedge(x,y,t): # x and y are nodes, t is type (0=last stage of the tree, 1=all other stages) if t == 1: print x + '--' + y + ' [color="#DDDDDD"]' if t == 0: print x + '--' + y + ' [color=grey]' return x + '--' + y

pf = factorization(n) # keys are factors, values are the multiplicity of the prime factors factors = [] for fac in pf: for multiplicity in range(0, pf[fac]): factors.append(fac) factors = sorted(factors, reverse=True) print 'graph G{' print 'overlap = false' nodenames = set() edgeset = set() whitenode = ' [fixedsize=True, width=0.1, height=0.1, color = white, fontcolor = white, shape = circle]' blacknode = ' [color = black, style = filled, fontcolor = black, shape = circle]' rootnode = createnode(1) currentset = set() currentset.add(rootnode) level = len(factors) + 1 for i in range(0, len(factors)): level -= 1 f = factors[i] nodetype = 1 if i == len(factors) - 1: nodetype = 0 newnodeset = set() for eachnode in currentset: for j in range(0, f): newnode = createnode(nodetype) newnodeset.add(newnode) edgeset.add(createedge(eachnode, newnode, level, nodetype)) currentset = newnodeset print '}' [/sourcecode]

For prime factorization, I used the this implementation.

Learning using Puzzles

Mathematical and logic puzzles are a great way to learn. I think they should be used to a greater extent in school and college education than they are, because they:

  1. Create an excitement in the mind of the student/solver.
  2. Force the solver to think about what method to use to attack the problem. This is much closer to real world situations, where one is not told about what area a problem is in. Tech companies recognize this when they use puzzles  while interviewing job candidates as a way to observe how the candidate thinks.
  3. Over time, allow students to recognize patterns in problems and the best way to attack them, so that wehn they are faced with a new, real world problem, they have a good idea of what to try first.

I am certainly not the first one to note the educational value ofpuzzles. Martin Gardner, whose puzzles I have tremendously enjoyed has himself advocated for puzzle-based learning.

Each puzzle is usually based on one (or, sometimes two, but very rarely more than two) fundamental areas of math: probability, number theory, geometry, critical thinking etc. For example, take the following puzzle (try these puzzles before reading answers at the end of the post):

[1] By moving just one of the digits in the following equation, make the equation true:

$latex 62 - 63 = 1$

It should be clear that this puzzle is part math and part logic.

Often there is more than one way to arrive at the solutions -- e.g. using a geometric visualization for a algebraic problem in 2 or 3 variables. For example take the following puzzle:

[2] A stick is broken up into 3 parts at random. What is the probability that the thee parts form a triangle?

One may choose to solve the above problem using algebra, but it can be visualized geometrically too.

Speaking of geometric methods, there is also something to be said about solving puzzles inside one's head, rather than using pen and paper. I feel it adds to the thrill because it can be more challenging. It is also fun to visualize objects and their relationships in mathematical terms,  completely inside one's head. The following is an example of a pretty hard puzzle, which was solved entirely inside one mathematicians head :

[3] Peter is given the product of two integers both of which are between 2 and 99, both included. Sally is given only their sum. The following conversation takes place between them:

P: I cannot figure out the two numbers from the product alone.

S: I already knew that you couldn't have figured them out.

P: Oh, now that you say that, I know what the two numbers are.

S: I see. Now that you say you know what they are, I also know what they are.

What are the two integers?



[1] $latex 2^6 - 63 = 1$

[2] Without loss of generality, assume that the stick is of unit length. Since the three lengths add up to 1, and are each $latex \in [0,1]$, they can be treated as three random variables $latex (x,y,z)$ in 3D space, lying on the 2-dimensional triangular simplex with vertices at (1,0,0), (0,1,0) and (0,0,1). The problem now boils down to determining what fraction of the surface of this triangle corresponds to tuples that cannot form a triangle. From the triangle inequality, we know that $latex x + y > z$ and so on for the other two pairs. A little mental visualization reveals that these boundary conditions are met when each side of these inequalities is $latex \frac{1}{2}$. The region corresponding to feasible points is a smaller equilateral triangle inscribed within the simplex and is one-fourth the area of the simplex. The answer therefore must be $latex \frac{1}{4}$.

[3] The answer can be found in this PDF of a note written by EW Dijkstra, who solved the problem in his head.

Document as a Database

The problem with conventional text documents is that you cannot easily query them. Suppose I have a 20 page piece of text explaining various things, perhaps with mathematical equations, and plots, etc. In order to figure out something, I need to patiently read through the document till I have my answer. When time is short, and usually time is short, I will skip some parts in a hurry to get to what I need. The skipped parts might contain intermediate facts or definitions, making it impossible for me to understand what I am looking for even when I do find it. This situation is because information is being presented in the document in an order decided by the author. Perhaps this is the best order. But for a specific query (e.g. what is the main result of this paper? ), it would be nice to have a way to do better than having to patiently read the whole document. The best we have right now is Ctrl+F on words and phrases, which just doesn't cut it for queries of reasonable complexity (e.g. 'Why did the GDP of Germany double so quickly?' while reading an article on the history of the German economy).

The computer doesn't understand the document :-(

It would be pretty useful to somehow model documents as databases that can support complex queries. Especially when the documents are large, like books. Perhaps I am asking for the holy grail of NLP? Surely, if IBM can build a Jeopardy-winning Watson, it should be possible to build a question-answering service for individual documents? Outside information is permitted.

Andre Geim

"Many people choose a subject for their PhD and then continue the same subject until they retire. I despise this approach. I have changed my subject five times before I got my first tenured position and that helped me to learn different subjects." - Sir Andre Konstantin Geim (2010 Nobel Prize winner in Physics).

You are my hero Andre.

Make webcomics easily

I have wanted to draw webcomics for at least the last 3 years, but never got around to figuring out an easy way to do it. Drawing on a computer with a mouse is a disaster. There are special tablets like Wacom's that artists use (I believe xkcd uses Wacom's tablet). There are also some high-end, expensive iPad apps (e.g. brushes). But I wanted to do very simple drawing and without spending any money. I finally decided to put together a easy to use method for drawing simple black & white comics. I think the simplest way to draw a webcomic is low-tech:

  1. Draw on paper with a black marker.
  2. Take a picture with your phone and upload to a computer. Unfortunately, a picture taken by a phone is not simple b&w, but contains many colours, graininess of the paper, the effect of lighting, etc. Therefore:
  3. Run the picture though a pixel thresholder.

I wrote a small python program to threshold the value of each pixel and make each pixel either pure white or pure black. The result allows me to go from the JPG picture on the left (taken with an iPhone 3GS) to the PNG image on the right:

Each pixel in the left image is simply an integer between 0 and 255 (after converting to grayscale). Now, this should be easy to do using a basic image editing program by 'converting to B&W', but the caveat is that one cannot simply use a threshold of 128 because the pixels corresponding to blank space on the paper may not necessarily be below 128. The result of using a fixed threshold of 128 was a disaster:

There is probably a menu option somewhere on a good image editor to do what I want, but I don't want to deal with image editors, and besides, I want to automate webcomic creation as much as possible. So I settled on running k-means clustering on the pixel values with k=2 clusters. This gives two clusters, one corresponding to the blacks and one corresponding to the whites in the image:

Histogram of pixel values (0-255). Cluster of blacks on the left with centroid at 16. Whites on the right, centroid at 110.

The average of the two centroids can be taken to be the threshold. The threshold for the example above came out to be 63 -- pretty far from 128! By the way, the mean pixel value is 103, which would clearly not have worked as well as the mean of the cluster centroids did.

This code may be useful to others who want to put up their comics/doodling on the web. Unfortunately, the program has a number of python module dependencies (Scipy, PyPNG, Numpy), because I just wanted a quick hack. But I'll put up the code here anyway. Someday, I'll get a github account.

[sourcecode language="python"] import numpy, png, os, sys, pylab as plt import scipy from scipy.cluster.vq import vq, kmeans

infile = sys.argv[1]

print 'Converting to PNG...' #Make sure input is a png pngfile = infile.split('.')[0]+'.png' os.system('convert ' + infile + ' ' + pngfile)

print 'Reading in pixels...' f = open(pngfile, 'rb') r = png.Reader(file=f) k = r.read() l = list(k[2]) numplanes = k[3]['planes'] l = zip(*l) numrows = len(l) l = l[0:numrows:numplanes] # This will skip every numplanes element l = zip(*l) f.close()

# Create a 1-D array of all the pixel values l = numpy.array(l) pixels = [] for i in range(0, len(l)): for j in range(0, len(l[0])): pixels.append(l[i][j]) pixels = numpy.array(pixels)

print 'Computing threshold...' # Find the adaptive threshold by running one-dimensional k-means, with k = 2, and find the mean of the two cluster centers. j = scipy.cluster.vq.kmeans(pixels, 2) adaptivethreshold = (j[0][0] + j[0][1])/float(2)

print 'thresholding...' # now threshold for i in range(0, len(l)): for j in range(0, len(l[0])): if l[i][j] < adaptivethreshold: l[i][j] = 0 else: l[i][j] = 255

print 'Writing to file...' # Now write: thresholdfile = pngfile.split('.')[0]+'-threshold.png' f = open(thresholdfile, 'wb') w = png.Writer(len(l[0]), len(l), greyscale = True) w.write(f, l) f.close() [/sourcecode]

The Effect of Interruptions

Tasks that require a high cognitive load, such as thinking about a problem, reading a research paper or writing C++ code,  are very sensitive to interruptions. I find that when I am interrupted during such activity, say by a phone call, or by a co-worker asking me something, the net cost of the interruption is not just the amount of time I spend attending to the call or person. It's as if I have a short term RAM in my head, and I have to re-load the entire context of what I was doing when I get back to it. And this can take a lot of time. What I like are large chunks of uninterrupted time, not several small chunks, even if they add up to the big chunk. However, shutting oneself off from all interactions with others at one's workplace is not the solution, because one runs the risk of not interacting enough with colleagues. Richard Hamming said about Bell Labs:

I notice that if you have the door to your office closed, you get more work done today and tomorrow, and you are more productive than most. But 10 years later somehow you don't know quite know what problems are worth working on; all the hard work you do is sort of tangential in importance.

I have personally found that conversations over coffee and lunch have led to much more interesting directions in my work than solitary thinking. While Hamming was speaking about a research lab environment, I suspect his statement holds true in other creative domains as well.

The apparently contradictory requirements above can be resolved by planning out both solitary time and interaction time during the work week. Both are necessary. I feel creative work requires a combination of two distinct phases. One that allows focused concentration on a task at hand, and another for free discussions, exchange of ideas, and making oneself available to others as a sounding board. Exploit and Explore. An ideal workspace should provide for both. It is said of the original Bell Labs building, that its designer deliberately created long corridors, so that researchers would be forced to encounter one another. Steven Johnson explains in his book 'Where Good Ideas Come From' that coffee houses played a crucial role, because they were breeding grounds for collective thinking.

During my travels on the Internets, I have come across a number of writings on the impact of interruptions on productivity that I can attest to from my own experience:

Paul Graham writes about meetings:

I find one meeting can sometimes affect a whole day. A meeting commonly blows at least half a day, by breaking up a morning or afternoon. But in addition there's sometimes a cascading effect. If I know the afternoon is going to be broken up, I'm slightly less likely to start something ambitious in the morning. I know this may sound oversensitive, but if you're a maker, think of your own case. Don't your spirits rise at the thought of having an entire day free to work, with no appointments at all? Well, that means your spirits are correspondingly depressed when you don't. And ambitious projects are by definition close to the limits of your capacity. A small decrease in morale is enough to kill them off.

Joel Spolsky writes about context switching between different projects.

Some interruptions are self created. Compulsive checking of email for instance. Or deciding to multitask, or trying to handle too many disconnected projects. I find I can do one major thing per day, if I try to do 2 or more major things, I risk accomplishing none. Having smartphones set to beep when an email arrives doesn't help. I switched my phone to not auto check email, long ago.

Time Management

I recently came across a surprisingly lucid model for time management. The model consists of just two variables that together categorize the tasks we do roughly into 4 bins. The two variables are 'Importance' and 'Urgency'. That is, a task has an importance score, and an urgency score. To simplify even further, imagine that each of the two variables is just a binary variable: that is, a given task is either important or not, and either urgent or not.

It is necessary to first understand the difference between 'important' and 'urgent'. An important task is one which is aligned with one's long term goals or makes a significant difference to one's life in the long term. An urgent task is one which demands immediate attention.

Which of the four quadrants above do you think we should operate in? Let's look at each of them:

4: Not urgent, and not important: This is the most useless category. Examples would include mindless browsing, excessive use of Facebook. This  is the quadrant of waste.

3: Not important but urgent: Petty things that sometimes take away our attention and time, preventing us from focusing on the important things, for example: receiving an email about a shopping sale on a website that is about to expire can lead us to stop what we are doing and start looking into shopping online. Very dangerous. This quadrant is called the quadrant of deception.

2: Urgent and important: When something important is delayed till it can no longer be delayed, it becomes urgent. Example: John wants to go to graduate school after college, but has delayed working on his application till the very last minute. Now the task is both important and urgent. This is the quadrant of procrastination. Important tasks done in a hurry are rarely done properly, and do not lead to satisfaction.

1: Not urgent, but important: This is the quadrant of planned action. It is the quadrant we want to be in in order to manage time well. Important tasks are planned and assigned time slots before they become urgent.

Note to self: Eliminate tasks in Q4, Q3, and move tasks in Q2 to Q1.

Edit: I just learnt that this model was popularized by Steven Covey, who passed away unexpectedly earlier this year.

Habit Formation

An urn contain balls of two colors, white and black. A ball is drawn at random, and then put back in the urn along with a new ball of the same color. This process is repeated  many times. What is the ratio of white balls to black balls in the urn after $latex n$ tries? What is the ratio as $latex n \rightarrow \infty$? The answer depends upon the initial number of white and black balls of course. This is the Polya Urn problem, and I find it to be an interesting way to think about the process of habit building. Imagine that the two types of balls represent two opposing choices that we can make about a certain aspect or activity in our lives, e.g. waking up early or not waking up early, doing regular exercise or not, smoking a cigarette after dinner, or not, completing homework on time every evening vs. playing video games during that time, etc.

[sourcecode language="python"] # Polya urn simulation in Python import random, pylab as plt

nruns = niter = 200 whitenum = blacknum = 1 for i in range(0,nruns): frac = [] nwhite = whitenum nblack = blacknum for j in range(0,niter): if random.random() < nwhite/float(nwhite+nblack): nwhite = nwhite + 1 else: nblack = nblack + 1 frac.append(nwhite/float(nwhite+nblack)) plt.plot(frac) [/sourcecode]

Let us say that picking a white ball corresponds to the desirable activity in these examples, and picking a black ball corresponds to the undesirable one. The fraction of white balls in the urn at any given time represents our proclivity to pick the desirable activity at that time.

This model has some interesting properties. First, if the number of balls of each type to begin with are equal, then the fraction of white balls at any step, and in particular at $latex n \rightarrow \infty$ is a random variable that behaves as follows:

Above: Fraction of whites, starting with 1 black and 1 white ball

Above: Histogram of the fraction of white balls after 20k iterations

Note that extreme outcomes are possible as $latex n \rightarrow \infty$ if appropriate choices are enforced early on. Also the final value is attained after only a few iterations. Early iterations provide the opportunity for large swings in the final outcome, but there are hardly any swings after crossing about 50 iterations. Similarly, the that longer a habit has had to cement itself, that harder it is to change it.

But the model above converges to an almost uniform distribution on the fraction of white balls after a large number of tries. Perhaps starting with 1 white and 1 black isn't quite right, because we are do not necessarily begin life with equal tendencies for picking each side of a binary choice. If we set the number of white balls to be greater than the number of black balls at the start, then the distribution of the fraction of white balls after many tries is, as expected, skewed in favor of white balls. Again, the fraction of white balls usually settles down to a fairly stable value that is typically  determined largely by the actions in the first few iterations.

Above: Fraction of whites, starting with 1 black and 15 white balls

Distribution of fraction of whites after 20k iterations starting with 5 whites, 1 black

If the initial number of balls is allowed to be fractional, then the limit distribution has most of its mass near 0 and 1. The plot below shows 500 iterations starting with 0.2 white and 0.2 black balls. Perhaps this better models habit formation in some people.

Fraction of white balls, starting with 0.2 whites and 0.2 blacks.
Distribution of fraction of whites, after 20k iterations, starting with 0.2 whites & blacks.

Habit building is not the only thing this model can describe. There are a number of phenomena in which reinforcement plays a role:

  1. Popularity of a brand: Among competing brands of similar quality, one that is slightly more popular may get picked more by customers, making it even more popular ("So many people choose this brand, so it must be good"). I'm sure marketing majors know this well.
  2. Rich get richer: It is easier to make $X when you have $100, compared to if you have only $10 to begin with.
  3. Short term market fluctuation in the price of a security on the stock exchange can sometimes have this property. A large number of buy orders will signal to the market a positive outlook on the security, resulting in even more buy orders. Ditto for sell orders. Until the market stabilizes. Similar to (1) above.
  4. Markets with network effects have a rather obvious reinforcement property: for e.g., the greater the number of users that facebook has, the greater the number of new users it is likely to attract, compared to, say, a competing network like Google+, because the marginal utility for a new user is greater when she join the larger network.
  5. Tragedy of the commons [PDF]: Defined by Wikipedia as: 'The depletion of a shared resource by individuals, acting independently and rationally according to each one's self-interest, despite their understanding that depleting the common resource is contrary to their long-term best interests'. Example: A grassy hill is shared by several farmer to graze their cows. If one farmer overgrazes, others have an incentive to quickly overgraze too, and eventually the utility of the hill is destroyed for all farmers.
  6. Hiring in a new company/group: If the organization has a good number of great/well-known people, it is easier to attract other good employees. Similar to (1).
  7. Thinking: Sometimes people make a choice about thinking about something in a certain way (e.g. so-and-so is a mean person), and this colors their interpretation of future observations, in effect reinforcing their prior notion.


1. In the form above, the urn model displays positive feedback. How can we change the urn model to display negative instead of positive feedback? Simple: put in an extra black ball if a white ball is picked and vice versa. This forces the fraction of white and black balls to remain close to 1/2.

2. Learning by rewards: There has been work in the psychology community that studies reinforcement learning in humans and animals, that is, given a set of choices and associated rewards, how does an animal learn which choice to pick? The Law of Effect is the hypothesis that the probability of picking a choice is proportional to the total reward accumulated from picking that choice in the past. Given a reward scheme, this translates to an Urn model with the colors representing rewards from the various possible actions. For a nice survey of reinforcement models, see: Robin Permantle, A survey of random processes with reinforcement, Probability Surveys, Vol. 4 (2007) .

3. The model above is the simplest urn model. There are several variations of this model, that have proved useful in modeling and learning tasks, other than reinforcement. E.g. using more than 2 colors, or using multiple urns, or allowing for the introduction of new colors. There are a number of interesting distributions for which these Pola Urn variations act as generating processes (Dirichlet-multinomial, Dirichlet Process, Chinese Restaurant Process), but that's another post.