In my earlier post today, I published some statistics about Python packages. Someone from Python users mailing lists pointed out that mean and standard deviation for such a highly skewed data was meaningless and better summary could be obtained from non-parametric statistics. So, I will publish some new statistics in this post.
First, second and third quartiles are 4KB, 11.5KB and 38KB respectively. Here is a histogram of
First, second and third quartiles are 0, 2 and 8 respectively. I couldn't normalize the data so no histogram here.
First, second and third quartiles are 16,39 and 100 respectively.
First, second and third quartiles are 56, 147 and 375 respectively.
Lines of Python code
I am not quite sure how exactly would skewness affect my predictions about population mean. I will just publish quartile parameters in my sample. First, second and third quartiles were 48, 199 and 498 respectively. It appears that most packages on PyPI are quite small. I also made a boxplot of this, but it was very squeezed and looked meaningless so I didn't post it.