七、排序(4)——qsort()
生活随笔
收集整理的這篇文章主要介紹了
七、排序(4)——qsort()
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
一、回顧
| 冒泡排序 | O(n2) | √ | √ |
| 插入排序 | O(n2) | √ | √ |
| 選擇排序 | O(n2) | × | √ |
| 快速排序 | O(nlogn) | × | √ |
| 歸并排序 | O(nlogn) | √ | × |
| 計數(shù)排序 | O(n+k),k是數(shù)據(jù)范圍 | √ | × |
| 桶排序 | O(n) | √ | × |
| 基數(shù)排序 | O(dn) d是維度 | √ | × |
- 線性排序算法的時間復雜度比較低,適用場景比較特殊。
- 小規(guī)模數(shù)據(jù)進行排序 ==》時間復雜度是 O(n2) 的算法;
- 大規(guī)模數(shù)據(jù)進行排序 ==》時間復雜度是O(nlogn) 的算法 ==》一般也是首選
二、快速排序的優(yōu)化
關鍵點:合理選擇分區(qū)點,來避免時間復雜度退化為O(n2)
理想的分區(qū)點:被區(qū)分點分開的兩個分區(qū)中,數(shù)據(jù)的數(shù)量差不多。
==》常用、簡單的分區(qū)算法
1、三數(shù)取中法
從區(qū)間的首、尾、中間,分別取出一個數(shù),然后進行比較,取三個數(shù)的中位數(shù)作為分區(qū)點。
==》擴展:數(shù)據(jù)大時,“五數(shù)取中”、“十數(shù)取中”
2、隨機法
每次從要排序的區(qū)間中,隨機選擇一個元素作為分區(qū)點。
特點:雖不能保證每次分區(qū)點都選的比較好,但是從概率角度分析,不太可能出現(xiàn)每次分區(qū)點都很差的情況。
3、警惕堆棧溢出
快速排序使用堆棧實現(xiàn)的。==》 遞歸要警惕堆棧溢出
解決方法:
(1)限制遞歸深度。一旦遞歸過深(超過事先設定的閾值),就停止遞歸。
(2)通過在堆上模擬實現(xiàn)一個函數(shù)調用棧,手動模擬遞歸壓棧、出棧的過程,使得系統(tǒng)棧大小沒有限制。
三、qsort()函數(shù)的分析
1、源碼
/* Copyright (C) 1991,1992,1996,1997,1999,2004 Free Software Foundation, Inc.This file is part of the GNU C Library.Written by Douglas C. Schmidt (schmidt@ics.uci.edu).The GNU C Library is free software; you can redistribute it and/ormodify it under the terms of the GNU Lesser General PublicLicense as published by the Free Software Foundation; eitherversion 2.1 of the License, or (at your option) any later version.The GNU C Library is distributed in the hope that it will be useful,but WITHOUT ANY WARRANTY; without even the implied warranty ofMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNULesser General Public License for more details.You should have received a copy of the GNU Lesser General PublicLicense along with the GNU C Library; if not, write to the FreeSoftware Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA02111-1307 USA. *//* If you consider tuning this algorithm, you should consult first:Engineering a sort function; Jon Bentley and M. Douglas McIlroy;Software - Practice and Experience; Vol. 23 (11), 1249-1265, 1993. */#include <alloca.h> #include <limits.h> #include <stdlib.h> #include <string.h>/* Byte-wise swap two items of size SIZE. */ /* 以字節(jié)為單位交換兩個SIZE長度數(shù)組的SWAP宏 */ #define SWAP(a, b,size) do { register size_t __size = (size);register char *__a = (a), *__b = (b); do{har __tmp = *__a;*__a++ = *__b;*__b++ = __tmp;} while (--__size > 0); } while (0)/* Discontinue quicksort algorithm when partition gets below this size.This particular magic number was chosen to work best on a Sun 4/260. */ /* 當一個數(shù)據(jù)段的長度小于這個值的時候,將不用快排對其進行排序。因為這個特別的的魔術數(shù)在Sun 4/260下的性能最好*/ #define MAX_THRESH 4 /* Stack node declarations used to store unfulfilled partition obligations. */ /* 用來存儲還沒處理的數(shù)據(jù)段索引*/ typedef struct {char *lo;char *hi; } stack_node;/* The next 4 #defines implement a very fast in-line stack abstraction. */ /* 下面四個宏實現(xiàn)了一個非常快速的棧(的確很快,但是現(xiàn)實中真要這樣做,請確保1. 這是核心代碼。2. 這些代碼百分比正確。要不然出錯了,調試調死你,維護的人罵死你。)*/ /* The stack needs log (total_elements) entries (we could even subtract log(MAX_THRESH)).Since total_elements has type size_t, we get as upper bound for log (total_elements):bits per byte (CHAR_BIT) * sizeof(size_t). */ /* 棧需要log(total_elements)個元素(當然我們也可以在這個基礎上減去log(MAX_THRESH)。)(PS:log(x/y) = log(x) - log(y))因為傳入數(shù)據(jù)個數(shù)total_elements的類型是size_t,所以我們可以把棧元素上限設為:size_t的位數(shù)(CHAR_BIT) * sizeof(size_t)。PS:棧用來記錄的是還沒處理的數(shù)據(jù)段索引,最壞的情況是分開的兩個數(shù)據(jù)段其中一個已經(jīng)不用再分,這個時候棧不許要任何記錄。最好的情況是進行l(wèi)og(total_elements)次劃分,此時棧需要記錄log(total_elements)個索引,但是這個算法在一個分片的元素個數(shù)小于MAX_THRESH便不再劃分,所以實際只需log(total_elements / MAX_THRESH)個空間。CHAR_BIT在limits.h有定義,意思是一個字節(jié)有多少位,因為sizeof是算出一種類型占幾個字節(jié),所以(CHAR_BIT) * sizeof(size_t)是當total_elements去最大值的值,也就是這里棧元素個數(shù)的上限。*/ #define STACK_SIZE (CHAR_BIT * sizeof(size_t)) #define PUSH(low, high) ((void) ((top->lo = (low)), (top->hi = (high)), ++top)) #define POP(low, high) ((void) (--top, (low = top->lo), (high = top->hi))) #define STACK_NOT_EMPTY (stack < top)/* Order size using quicksort. This implementation incorporatesfour optimizations discussed in Sedgewick:這個快排的程序實現(xiàn)了Sedgewick書中討論的四個優(yōu)化,下面從大到小說明:(大概這意思...)1. Non-recursive, using an explicit stack of pointer that store thenext array partition to sort. To save time, this maximum amountof space required to store an array of SIZE_MAX is allocated on thestack. Assuming a 32-bit (64 bit) integer for size_t, this needsonly 32 * sizeof(stack_node) == 256 bytes (for 64 bit: 1024 bytes).Pretty cheap, actually.1. 不用遞歸,用了顯示的指針棧存儲下一段要排序的數(shù)據(jù)。為了節(jié)省時間,為棧申請了最大的存儲空間。假設size_t是一個32位(64位)的整數(shù),這里僅需要 32 * sizeof(stack_node) = 256 bytes(對于64位:1024bytes)。事實上很小。(一個棧節(jié)點兩指針,32位就是2 * 4 個字節(jié),64位是8 * 2位字節(jié))2. Chose the pivot element using a median-of-three decision tree. This reduces the probability of selecting a bad pivot value andeliminates certain extraneous comparisons.2. 用中值決策樹選擇關鍵值。這減小了選擇一個差關鍵值的可能性和排除特定的無關的比較。3. Only quicksorts TOTAL_ELEMS / MAX_THRESH partitions, leavinginsertion sort to order the MAX_THRESH items within each partition.This is a big win, since insertion sort is faster for small, mostlysorted array segments.3. 只用快排對TOTAL_ELEMS / MAX_THRESH個數(shù)據(jù)段進行了排序,用插入排序對每個數(shù)據(jù)段的MAX_THRESH個數(shù)據(jù)進行排序。這是一個很好的改進,因為插入排序在處理小的、基本有序的數(shù)據(jù)段時比快排更快。 4. The larger of the two sub-partitions is always pushed onto thestack first, with the algorithm then concentrating on thesmaller partition. This *guarantees* no more than log (total_elems)stack size is needed (actually O(1) in this case)!4. 大的數(shù)據(jù)分段通常先壓入棧內,算法優(yōu)先處理小的數(shù)據(jù)分段。這就保證棧的元素不會超過log(total_elems)(事實上這里只用了常數(shù)個空間)。 */ void _quicksort (void *const pbase, size_t total_elems, size_t size, __compar_d_fn_t cmp, void *arg) {/* 寄存器指針,最快的指針,當然系統(tǒng)不一定會把它放到寄存器。register只是一種建議。*/register char *base_ptr = (char *) pbase;const size_t max_thresh = MAX_THRESH * size;if (total_elems == 0)/* Avoid lossage with unsigned arithmetic below. */return;if (total_elems > MAX_THRESH){char *lo = base_ptr;char *hi = &lo[size * (total_elems - 1)];/* 因為用了上面棧的宏,所以下面兩個變量的名字一定不能改...*/stack_node stack[STACK_SIZE];stack_node *top = stack;PUSH (NULL, NULL);while (STACK_NOT_EMPTY){char *left_ptr;char *right_ptr;/* Select median value from among LO, MID, and HI. RearrangeLO and HI so the three values are sorted. This lowers theprobability of picking a pathological pivot value and skips a comparison for both the LEFT_PTR and RIGHT_PTR in the while loops. *//* 在數(shù)組的第一位、中間一位、最后一位中選出一個中值。同時也會對第一位和最后一位進行排序以達到這三個值都是有序的目的。這降低了選擇一個很爛的關鍵值的可能性,同時也跳過了左指針和右指針在while循環(huán)里面的一次比較。*/char *mid = lo + size * ((hi - lo) / size >> 1);if ((*cmp) ((void *) mid, (void *) lo, arg) < 0)SWAP (mid, lo, size);if ((*cmp) ((void *) hi, (void *) mid, arg) < 0)SWAP (mid, hi, size);elsegoto jump_over;if ((*cmp) ((void *) mid, (void *) lo, arg) < 0)SWAP (mid, lo, size); jump_over:;left_ptr = lo + size;right_ptr = hi - size;/* Here's the famous ``collapse the walls'' section of quicksort.Gotta like those tight inner loops! They are the main reasonthat this algorithm runs much faster than others. *//* 這里就是快排中著名的“collapse the walls(推墻??)”。和那些緊湊的內層循環(huán)非常像!它們是這個算法比其他算法快的主要原因。PS:了解過快排的應該對下面這一段都比較熟悉,就是把比關鍵值小的移到數(shù)組左邊,比關鍵值大的移到數(shù)組右邊,把數(shù)據(jù)分成大小兩段的過程*/do{while ((*cmp) ((void *) left_ptr, (void *) mid, arg) < 0)left_ptr += size;while ((*cmp) ((void *) mid, (void *) right_ptr, arg) < 0)right_ptr -= size;if (left_ptr < right_ptr){SWAP (left_ptr, right_ptr, size);if (mid == left_ptr)mid = right_ptr;else if (mid == right_ptr)mid = left_ptr;left_ptr += size;right_ptr -= size;}else if (left_ptr == right_ptr){left_ptr += size;right_ptr -= size;break;}}while (left_ptr <= right_ptr);/* Set up pointers for next iteration. First determine whetherleft and right partitions are below the threshold size. If so,ignore one or both. Otherwise, push the larger partition'sbounds on the stack and continue sorting the smaller one. *//* 給下次迭代的指針賦值。首先判斷左右兩段數(shù)據(jù)的元素個數(shù)是否小于閾值,如果是,跳過這一個或兩個分段。如若不是,把大的數(shù)據(jù)段的開始和結束指針入棧,繼續(xù)劃分小的數(shù)據(jù)段。*/if ((size_t) (right_ptr - lo) <= max_thresh){/* 左右兩個數(shù)據(jù)段的元素都小于閾值,取出棧中數(shù)據(jù)段進行劃分*/if ((size_t) (hi - left_ptr) <= max_thresh)/* Ignore both small partitions. */POP (lo, hi);else/* Ignore small left partition. (只有左邊大于閾值)*/lo = left_ptr;}else if ((size_t) (hi - left_ptr) <= max_thresh)/* Ignore small right partition. (只有右邊大于閾值)*/hi = right_ptr;else if ((right_ptr - lo) > (hi - left_ptr)){/* Push larger left partition indices. *//* 兩個數(shù)據(jù)段的元素個數(shù)都大于閾值,大的入棧,小的繼續(xù)劃分。 */PUSH (lo, right_ptr);lo = left_ptr;}else{/* Push larger right partition indices. *//* 兩個數(shù)據(jù)段的元素個數(shù)都大于閾值,大的入棧,小的繼續(xù)劃分。 */PUSH (left_ptr, hi);hi = right_ptr;}}}/* Once the BASE_PTR array is partially sorted by quicksort the restis completely sorted using insertion sort, since this is efficientfor partitions below MAX_THRESH size. BASE_PTR points to the beginningof the array to sort, and END_PTR points at the very last element inthe array (*not* one beyond it!). *//* 當數(shù)組經(jīng)過快排的排序后,已經(jīng)是整體有序了。剩下的排序由插入排序完成,因為數(shù)據(jù)段小于MAX_THRESH時,插入排序效率更高。此時排序的首指針是數(shù)組的首指針,尾指針是數(shù)組的尾指針(不是倒數(shù)第二個)*/#define min(x, y) ((x) < (y) ? (x) : (y)){char *const end_ptr = &base_ptr[size * (total_elems - 1)];char *tmp_ptr = base_ptr;char *thresh = min(end_ptr, base_ptr + max_thresh);register char *run_ptr;/* Find smallest element in first threshold and place it at thearray's beginning. This is the smallest array element,and the operation speeds up insertion sort's inner loop. *//* 找出第一段的最小一個值并把它放在數(shù)組的第一個位置。這是數(shù)組的最小元素(用快排排過,應該比較容易理解),這一步可以加入插入排序的內層循環(huán)*/for (run_ptr = tmp_ptr + size; run_ptr <= thresh; run_ptr += size)if ((*cmp) ((void *) run_ptr, (void *) tmp_ptr, arg) < 0)tmp_ptr = run_ptr;if (tmp_ptr != base_ptr)SWAP (tmp_ptr, base_ptr, size);/* Insertion sort, running from left-hand-side up to right-hand-side. *//* 從左到右執(zhí)行插入排序。*/run_ptr = base_ptr + size;while ((run_ptr += size) <= end_ptr){/*上面說的加速內層循環(huán),就是不用在這里判斷*/tmp_ptr = run_ptr - size;/*一直往回找直到找到大于或等于當前元素的元素*/while ((*cmp) ((void *) run_ptr, (void *) tmp_ptr, arg) < 0)tmp_ptr -= size;/*把當前元素移到找出元素的后面去*/tmp_ptr += size;if (tmp_ptr != run_ptr){char *trav;trav = run_ptr + size;while (--trav >= run_ptr){char c = *trav;char *hi, *lo;/*這個內層循環(huán)只是把每個元素的最后一位移到后面一個元素去*/for (hi = lo = trav; (lo -= size) >= tmp_ptr; hi = lo)* hi = *lo;*hi = c;}}}} }2、分析
通過多種排序算法實現(xiàn)的:
- 當數(shù)據(jù)量較小是,qsort() 會優(yōu)先使用歸并算法來排序輸入數(shù)據(jù)(原因:空間復雜度為O(n),空間換時間)
- 當數(shù)據(jù)量太大,qsort() 使用快速排序,分區(qū)點選擇方法:“三數(shù)取中法”
- 遞歸太深會導致堆棧溢出問題的解決方法:qsort()實現(xiàn)了一個堆上的棧,手動模擬遞歸來解決該問題。
- 當排序的區(qū)間中的元素小于等于4時,qsort()就退化為插入排序,不再使用遞歸來做快速排序。原因:在小規(guī)模數(shù)據(jù)面前,O(n2時間復雜度并不一定比O(nlogn)的算法執(zhí)行時間長)
- 時間復雜度代表增長趨勢,但是在表示的過程中,我們會省略低階、系數(shù)和常數(shù)。例:O(knlogn + c)時間復雜度中的 k 和 c 可能還是一個比較大的數(shù)。
總結
以上是生活随笔為你收集整理的七、排序(4)——qsort()的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 七、排序(3)——线性排序
- 下一篇: 八、二分查找(Binary Search