精选学习资源汇总-CSDN博客

本文链接：https://blog.csdn.net/haidao2009/article/details/9491567

最近出来实习，泪奔，没时间学习了，把一些觉得很好但是没时间看的资源放这以后学习

如果说理解一个技术的最高境界，就是能够用最简单的方式将这个技术表达出来的话，那么Igor对于CPU Cache的理解绝对达到了此境界。他的博文：Gallery of Processor Cache Effects http://t.cn/hrXwvb 7个简单至极的代码示例，覆盖了Cache Line、Cache Size、False Sharing等重要知识点，不得不服

NAACL今天的tutorial包括了斯坦福Richard Socher和Christopher Manning关于深度学习在NLP中应用的教学讲座。看了一下slides，比去年ACL的版本增加了一些新内容，可以算是关于深度学习在语言技术的应用中相当全面的tutorial了。"Deep Learning for NLP (without Magic)" slides: http://t.cn/zHHyKUo

mahout 应用非常多的实例

http://chimpler.wordpress.com/category/mahout/

教程tutorial

ubc 的machine learning 2013 课程

有mcmc 以及最新的深度学习的课程

http://www.cs.ubc.ca/~nando/540-2013/lectures.html

文本挖掘技术

http://www.icst.pku.edu.cn/course/mining/11-12spring/index.html

rbm java 代码估计是最对我胃口的代码

https://github.com/tjake/rbm-dbn-mnist

Stanford NLP组专门设置了Deep Learning in Natural Language Processing的主页

http://nlp.stanford.edu/projects/DeepLearningInNaturalLanguageProcessing.shtml

一个大牛的主页

http://alex.smola.org/

这是其教学有很多资料

http://alex.smola.org/teaching/

http://www.cs.princeton.edu/courses/archive/spring10/cos424/w/syllabus

The Large Scale Learning class notes

http://cilvr.cs.nyu.edu/doku.php?id=courses:bigdata:slides:start

算法tutorial

一个剑桥大学教授的主页高斯过程的pdf讲得很细很好

http://mlg.eng.cam.ac.uk/zoubin/

变分贝叶斯 tutorial 很nice

http://people.inf.ethz.ch/bkay/talks/Brodersen_2013_03_22.pdf

关于协同过滤和graph mind 的hadoop 实现

https://code.google.com/p/hadoop-network/

单机模式处理大数据，搜集一些好用的开源利器

1. LibFM

项目主页：http://www.libfm.org/

2. Svdfeature

项目主页：http://apex.sjtu.edu.cn/apex_wiki/svdfeature

3. Libsvm和Liblinear

libsvm项目主页：http://www.csie.ntu.edu.tw/~cjlin/libsvm/

liblinear项目主页：http://www.csie.ntu.edu.tw/~cjlin/liblinear/

初次使用必读：practical guide

libsvm的开发心得by林智仁：http://www.csie.ntu.edu.tw/~cjlin/talks/kdd.pdf

4. rt-rank

项目主页：http://research.engineering.wustl.edu/~amohan/

rt-rank中实现了推荐系统中常见的random forests和gradient boosted decision trees这两种方法，使用起来很方便。

3. Mahout

项目主页：http://mahout.apache.org/

4. MyMediaLite

项目主页：http://www.ismll.uni-hildesheim.de/mymedialite/

4. GraphLab 和 GraphChi

GraphLab项目主页：http://graphlab.org/

GraphChi项目主页：http://graphlab.org/graphchi/

GraphChi的下载地址：https://code.google.com/p/graphchi/downloads/detail?name=graphchi_src_v0.1.2_toolkits.tar.gz

GraphChi介绍：http://www.technologyreview.com/news/428497/your-laptop-can-now-analyze-big-data/?nlid=nldly&nld=2012-07-17

CF for GraphChi: http://bickson.blogspot.com/2012/08/collaborative-filtering-with-graphchi.html

pylearn2

https://github.com/lisa-lab/pylearn2

包含很多特性，更新很快

Training algorithms
- A “default training algorithm” that asks the model to train itself
- Stochastic gradient descent, with extensions including
  
  Learning rate decay
  Momentum
  Polyak averaging
  Early stopping
  A simple framework for adding your own extensions
- Batch gradient descent with line searches
- Nonlinear conjugate gradient descent (with line searches)
Model Estimation Criteria
- Score Matching
- Denoising Score Matching
- Noise-Contrastive Estimation
- Cross-entropy
- Log-likelihood
Models
- Autoencoders, including Contractive and Denoising Autoencoders
- RBMs, including gaussian and ssRBM. Varying levels of integration into
  
  the full framework.
- k-means
- Local Coordinate Coding
- Maxout networks
- PCA
- Spike-and-Slab Sparse coding
- SVMs (we provide a wrapper around scikit-learn that makes it easy to
  
  train a multiclass svm on dense training data in a memory efficient way, which doesn’t always happen using scikit-learn directly)
- Partial implementation of DBMs (contact Ian Goodfellow if you would like
  
  to complete it)
Datasets:
- MNIST, MNIST with background and rotations
- STL-10
- CIFAR-10, CIFAR-100
- NIPS Workshops 2011 Transfer Learning Challenge
- UTLC
- NORB
- Toronto Faces Dataset
Dataset pre-processing
- Contrast normalization
- ZCA whitening
- Patch extraction (for implementing convolution-like algorithms)
- The Coates+Lee+Ng CIFAR processing pipeline
Miscellaneous algorithms and utilities:
- AIS
- Weight visualization for single layer networks
- Can plot learning curves showing how user-configured quantities
  
  change during learning