Sorting is a process of placing elements from a collection in an order
that we defined. We can sort a collection of elements numerically,
alphabetically, or length. There are a lot of algorithms we can use to sort
elements in a collection. However, they have different running time
complexities. Hence, we have to choose it carefully when we try to sort things
efficiently. So far, the integer sorting methods we have learned are quick sort, bubble sort, insertion
sort, selection sort, merge sort, count sort, and tim sort.
If we want to analyze the running time of those sorting techniques, the
best way is the big-oh notation. Big-oh is an upper-bound the various classes
fit into a hierarchy as shown:
O(1) ⊂ O(log n) ⊂ O(n) ⊂ O(n log n) ⊂ O(n^a) ⊂ O (b^n) ⊂ O (n!) ⊂ O(n^n)
where n is list size,
a is a natural number, and b is a natural number.
After some discussions during lectures, and a lot practice in labs, I
had a good understanding of those sorting techniques and their complexity.
Bubble sort, insertion sort, and selection sort have similar running time
complexity. Their worst and average running time are O(n^2). However, the best
case for bubble sort and insertion sort are O(n), while best case for selection
sort is sill O(n^2). They have such difference is because the algorithms of
bubble sort and insertion sort allow it to exit once it confirmed the list is
already sorted. Selection sort does not have this property, it is designed to
compare each element with every other item.
Merge sort's running time complexity is very easy to analyze, because it
will always have the same process no matter if the list is sorted or not.
Similar to selection sort, but merge sort is more efficient. So its time complexity
for best, average, and worst cases are O(n log n). It takes approximate
log n steps to split the list, and it takes less than n steps for comparisons.
So, the total running time is n log n steps.
Quick sort's general idea is kind of similar to merge sort. The difference
between them is that there is no guarantee for quick sort to split the original
list into two equally sublists. The splitting really depends on the pivot we
choose, since sublists are created by the numbers that are smaller than pivot,
and the numbers that are equal to or larger than pivot. Hence, if we choose a
good pivot which will approximately divide the original list into two equally
sublists, running time complexity for quick sort will be the same as merge
sort, O(n log n). But if we keep choosing bad pivots, the ones will make one of
the sublists empty, we would have a time complexity of O(n^2).
Count sort is a little special. It is similar to radix sort and bucket
sort. It creates a temporary list containing all zeros with length of the
maximum number in the origin list. Then run though the original list, as well
as record information. Then create a sorted list according to the temporary
list. Since we only run through the original list once, it makes us only have a
running time complexity of O(n) for all best, average, and worst cases. It
would work great if n, the size of the original list, is really large. The
problem is that if we have a small size list, and a really big integer in it,
count sort has to create a really big list due to the big integer. That step
may take a remarkable long time compare other steps. Hence, even count sort has
the best time complexity, we do not use it in Python's built-in sort.
Tim sort is the sort we use every time we call the Python's built-in
sort on a list. It is a hybrid sorting algorithm, that take advantages
from merge sort and insertion sort. Hence, it takes the better time complexity
between merge sort and insertion sort in all cases. The best case for tim sort
is O(n), which is the as insertion sort. The average and worst cases are O(n
log n), which is the same as merge sort.
Since sorting is a really important area in computer science, I am
really looking forward to study more sorting techniques I have not learned. I
believe if I do well in sorting, it will help me a lot to study computer
science in the future.