本节谈一谈算法分析和大O估算法(big-O notation)。算法效率的度量一般采用事前分析估算的方法,通常的做法是,“从算法中选取一种对于所研究的问题(或算法类型)来说是基本操作的原操作,以该基本操作重复执行的次数作为算法的时间量度”。谈到这里时,作者引出了大O估算法。
在本书中,作者对大O估算法的介绍显得有些草率。一开始就冒出一个式子T(n) = O(n3),然后在本页最底下用小字介绍了所谓的“"O"的形式定义”:若f(n)是正整数n的一个函数,则xn=O(f(n))表示存在一个正的常数M,使得当n≥n0时都满足|xn|≤M|f(n)|。也许是我数学基础太差,总之看到这个定义时我一头雾水。不知道为什么作者没有花一点篇幅介绍大O估算法的由来和定义。我google了一下,发现了这样的介绍:
Definition: A theoretical measure of the execution of an algorithm, usually the time or memory needed, given the problem size n, which is usually the number of items. Informally, saying some equation f(n) = O(g(n)) means it is less than some constant multiple of g(n). The notation is read, "f of n is big oh of g of n".
Formal Definition: f(n) = O(g(n)) means there are positive constants c and k, such that 0 ≤ f(n) ≤ cg(n) for all n ≥ k. The values of c and k must be fixed for the function f and must not depend on n.
Note: As an example, n² + 3n + 4 is O(n²), since n² + 3n + 4 < 2n² for all n > 10. Strictly speaking, 3n + 4 is O(n²), too, but big-O notation is often misused to mean equal to rather than less than. The notion of "equal to" is expressed by Θ(n).
The importance of this measure can be seen in trying to decide whether an algorithm is adequate, but may just need a better implementation, or the algorithm will always be too slow on a big enough input. For instance, quicksort, which is O(n log n) on average, running on a small desktop computer can beat bubble sort, which is O(n²), running on a supercomputer if there are a lot of numbers to sort. To sort 1,000,000 numbers, the quicksort takes 20,000,000 steps on average, while the bubble sort takes 1,000,000,000,000 steps!
Any measure of execution must implicitly or explicitly refer to some computation model. Usually this is some notion of the limiting factor. For one problem or machine, the number of floating point multiplications may be the limiting factor, while for another, it may be the number of messages passed across a network. Other measures which may be important are compares, item moves, disk accesses, memory used, or elapsed ("wall clock") time.
(以上介绍来自:Paul E. Black, "big-O notation", from Dictionary of Algorithms and Data Structures, Paul E. Black, ed., NIST.)
另外,这个帖子也讨论了算法的时间复杂度估计,说得非常通俗易懂。