Python as a statistics workbench

Lots of people use a main tool like Excel or another spreadsheet, SPSS, Stata, or R for their statistics needs. They might turn to some specific package for very special needs, but a lot of things can be done with a simple spreadsheet or a general stats package or stats programming environment.

I've always liked Python as a programming language, and for simple needs, it's easy to write a short program that calculates what I need. Matplotlib allows me to plot it.

Has anyone switched completely from, say R, to Python? R (or any other statistics package) has a lot of functionality specific to statistics, and it has data structures that allow you to think about the statistics you want to perform and less about the internal representation of your data. Python (or some other dynamic language) has the benefit of allowing me to program in a familiar, high-level language, and it lets me programmatically interact with real-world systems in which the data resides or from which I can take measurements. But I haven't found any Python package that would allow me to express things with "statistical terminology" – from simple descriptive statistics to more complicated multivariate methods.

What can you recommend if I wanted to use Python as a "statistics workbench" to replace R, SPSS, etc.?

What would I gain and lose, based on your experience?



It's hard to ignore the wealth of statistical packages available in R/CRAN. That said, I spend alot of time in Python land and would never dissuade anyone from having as much fun as I do. :) Here are some libraries/links you might find useful for statistical work.

  • NumPy/Scipy You probably know about these already. But let me point out theCookbook where you can read about many statistical facilities already available and theExample List which is a great reference for functions (including data manipulation and other operations). Another handy reference is John Cook'sDistributions in Scipy.

  • pandas This is a really nice library for working with statistical data -- tabular data, time series, panel data. Includes many builtin functions for data summaries, grouping/aggregation, pivoting. Also has a statistics/econometrics library.

  • larry Labeled array that plays nice with NumPy. Provides statistical functions not present in NumPy and good for data manipulation.

  • python-statlib A fairly recent effort which combined a number of scattered statistics libraries. Useful for basic and descriptive statistics if you're not using NumPy or pandas.

  • statsmodels Statistical modeling: Linear models, GLMs, among others.

  • scikits Statistical and scientific computing packages -- notably smoothing, optimization and machine learning.

  • PyMC For your Bayesian/MCMC/hierarchical modeling needs. Highly recommended.

  • PyMix Mixture models.

If speed becomes a problem, consider Theano -- used with good success by the deep learning people.

There's plenty of other stuff out there, but this is what I find the most useful along the lines you mentioned.



http://stats.stackexchange.com/questions/1595/python-as-a-statistics-workbench

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值