The short answer is “about 6 years”. But boy, do we have a long answer for you.

6 years?
Wasn’t I always told that the average Ph.D lasts about 5 years? We took a hard look at the data, and apparently, it doesn’t. We examined data about all the Ph.Ds completed at IIT Bombay since 1990, and the average length is about 5.9 years. Ah, but you object, that’s most likely due to outliers – the few students who take 8, or 10, or 14 (yep) years, to complete their Ph.D, and skew the mean.  Nope. The value of 5.9 is after taking care of all of these. In fact, the Civil Engineering department, which has the shortest average Ph.D length, clocks in at 5.1 years. CSE, at the longest, takes about 6.7 years on average.

Here’s a visualization of all the students who have completed their Ph.Ds at IITB since 1990, classified by department, in order of the average Ph.D length:

[Click on image for better resolution]
Coincidentally, both the longest (14 years) and the shortest (2 years) have both been in Chemical Engineering.
Averaging across departments,  the distribution looks something like this:
[Click on image for better resolution]

Over 32% of the students took 6 years to complete their Ph.D

An interesting question to ask is whether the average length of the Ph.D has been constant over the years. Let’s take a look.
Darkness of circle represents the number of students at that data point.
Wow! It looks like the average Ph.D length has been consistently decreasing! While the trend seems quite shocking at first, there are a couple of catches: First, the data from the earliest few years is drawn from very very few data points, and is extremely sensitive to aberrations. In fact, things seem to have stabilized quite a bit by 1995, which is a good reference. Second, a large number of Ph.Ds started in the last few years are still going on! So while it may seem that the average for 2011, for instance, is around 3.7, keep in mind that anybody who is planning to take over 4 years is still in the institute, and hence, not a part of the dataset at all. This reduces the apparent length for the last few years. This is the classic survivorship bias.
Ignoring these periods prone to errors or biases, for the period of 1995 to 2009, the average length seems to have stayed more or less constant – although, the number of outliers seems to have increased.
Hope you enjoyed our foray into data journalism. Do give us feedback on how to improve, in the comments section below. Look out for the next post very soon. In the meantime, if you’re feeling adventurous, all code used to gather the data and create the visualisations has been open-sourced. Feel free to play with it and derive some great statistics of your own. Some more crazy visualizations on the dataset are here.
1. Data for some departments has been combined in cases when the departments merged – e.g. IT with CSE, Bio-Medical Engineering with Biosciences & Bioengineering, and so on.
2. All data was sourced from Electronic Theses and Dissertations (ETD) Archives maintained by the Central Library, IIT Bombay. Every student’s graduation year was scraped from the thesis, and their joining year calculated from their roll number. Some corrections were made manually. Huge thanks to the ETD for maintaining the archives and making them publicly available.