Thursday, March 27, 2014

Sorting and Efficiency

        Sorting is a process of placing elements from a collection in an order that we defined. We can sort a collection of elements numerically, alphabetically, or length. There are a lot of algorithms we can use to sort elements in a collection. However, they have different running time complexities. Hence, we have to choose it carefully when we try to sort things efficiently. So far, the integer sorting methods we have learned are quick sort, bubble sort, insertion sort, selection sort, merge sort, count sort, and tim sort.
        If we want to analyze the running time of those sorting techniques, the best way is the big-oh notation. Big-oh is an upper-bound the various classes fit into a hierarchy as shown:
O(1)  O(log n)  O(n)  O(n log n) O(n^a)  O (b^n)  O (n!)  O(n^n)
where n is list size, a is a natural number, and b is a natural number.
        After some discussions during lectures, and a lot practice in labs, I had a good understanding of those sorting techniques and their complexity. Bubble sort, insertion sort, and selection sort have similar running time complexity. Their worst and average running time are O(n^2). However, the best case for bubble sort and insertion sort are O(n), while best case for selection sort is sill O(n^2). They have such difference is because the algorithms of bubble sort and insertion sort allow it to exit once it confirmed the list is already sorted. Selection sort does not have this property, it is designed to compare each element with every other item.
        Merge sort's running time complexity is very easy to analyze, because it will always have the same process no matter if the list is sorted or not. Similar to selection sort, but merge sort is more efficient. So its time complexity for best, average, and worst cases are  O(n log n). It takes approximate log n steps to split the list, and it takes less than n steps for comparisons. So, the total running time is n log n steps.
        Quick sort's general idea is kind of similar to merge sort. The difference between them is that there is no guarantee for quick sort to split the original list into two equally sublists. The splitting really depends on the pivot we choose, since sublists are created by the numbers that are smaller than pivot, and the numbers that are equal to or larger than pivot. Hence, if we choose a good pivot which will approximately divide the original list into two equally sublists, running time complexity for quick sort will be the same as merge sort, O(n log n). But if we keep choosing bad pivots, the ones will make one of the sublists empty, we would have a time complexity of O(n^2).
        Count sort is a little special. It is similar to radix sort and bucket sort. It creates a temporary list containing all zeros with length of the maximum number in the origin list. Then run though the original list, as well as record information. Then create a sorted list according to the temporary list. Since we only run through the original list once, it makes us only have a running time complexity of O(n) for all best, average, and worst cases. It would work great if n, the size of the original list, is really large. The problem is that if we have a small size list, and a really big integer in it, count sort has to create a really big list due to the big integer. That step may take a remarkable long time compare other steps. Hence, even count sort has the best time complexity, we do not use it in Python's built-in sort.
        Tim sort is the sort we use every time we call the Python's built-in sort on a list.  It is a hybrid sorting algorithm, that take advantages from merge sort and insertion sort. Hence, it takes the better time complexity between merge sort and insertion sort in all cases. The best case for tim sort is O(n), which is the as insertion sort. The average and worst cases are O(n log n), which is the same as merge sort.
        Since sorting is a really important area in computer science, I am really looking forward to study more sorting techniques I have not learned. I believe if I do well in sorting, it will help me a lot to study computer science in the future.


Wednesday, March 19, 2014

Week 10 A2 Experience

        In the past two weeks, I have been working on assignment two with my partners. It apparently is not easy task. We began with understanding of the information that was provided, then think about how to approach those questions, last we wrote and tested our code. That sounded like a really good plan, but realistically, nothing was as simple as we thought.
        Assignment two's topic is about regex, which also known as regular expression. A regex is a combination of strings that have specific meanings. An example from our assignment is '((1.(0|2)*).e)'.
        Our task is to write code for some functions. Everything went well until we got to the last function, regex_match. This function requires us to return true if and only if the given regex matches its specific rules. We approached this problem with the way our professor taught us in lecture. But we still could not get it right. After we talked about it with another friend who is in a different group, he suggested we could use the examples on discussion board to test our code. Amazingly, we passed all of them.
        In the end, we concluded that we misunderstood the question in the beginning, and our code worked ever since we wrote it. However, I still doubted it. I guess I have to wait the unittest results from MarkUs.

Thursday, March 13, 2014

Week 9 Searching Helper - Binary Search Tree

        Binary search tree is a new class we learned in lectures. It is a tree class with root, and children, and it is designed for fast searching. It has some properties to make it happens. First of all, the left subtree only contains the nodes whose corresponding value is less than the value related to root. Similarly, its right subtree only contains the nodes those corresponding value that is greater than the value on root. Additionally, Its left and right subtrees should also be binary search tree. The last necessary rule is that there is no duplicate nodes allowed.
        With above properties and certain methods defined, if we use binary serach tree to search on a value, the worst time complexity in terms of big O notation is that O(n), while the best case has time complexity of O(log n), where n is the number of elements in the list. Compare to linear searching methods, it has a remarkable improvement.
        However, binary search tree has a lot of things I still do not familiar with, such as how to code its deleting method. I would spend the rest of this week try to have a better understanding of it, so I can be prepared for our following test.

Thursday, March 6, 2014

Week 8 Magic list: LinkedList

        LinkedList, a common recursive data structure, has been our topic in both lectures and tutorials in the past two weeks. It is commonly used to solve problems that are created by the built-in list, an example would be the class Queue I discussed in the beginning of the year.In class Queue we created, we used list to store and modify information. When we need to remove a item from the beginning of the list as required, every other item except the removed item will need to shift forward by one index, which would take a remarkable long time if the list is large. However, if we use LinkedList to store information, we could solve this problem.
        LinkedList is defined recursively. A LinkedList is either empty, which is represented by None, or an object that contains a cargo object and a reference to a LinkedList. A simple example of __init__ would be:
                   def __init__(self, item, rest: 'LinkedList') -> None:
                                     self.item = item
                                     self.rest = rest
We can add more instance variables to give LinkedList more attributes, such as self.empty, self.size, etc. On the other hand, we can make methods or a helper class to make LinkedList more like a built-in list. For example, we can have __getitem__, __setitem__, __delitem__, and insert to have access to the item on a specific index, as well as __eq__, __str__, __repr__, and reverse. 
        We can make LinkedList infinite closer to built-in list while LinkedList has its own advantages. However, it also has disadvantages. There is a possibility that it becomes a infinite list. It will happen when self.rest refers to self, or some LinkedList that is before self.
        There are still a lot things for me to discover about LinkedList. I will try to fully understand it, and flexibly switch between LinkedList and built-in list when I code. I believe it will help me solve a lot run-time problems like Queue in the future.

Thursday, February 27, 2014

Recursion

       Recursion is a very powerful tool we can access in Python. Actually, most of programming languages support recursion, which shows the importance of recursion in the field of computer science programming. In real life, a lot of problems is much easier after we break them into pieces, then solve those subproblems one by one. Recursion can help us to do that when we code.
       In general, Recursion means "defining something in terms of itself", and in Python, we call the function again inside of that function. However, in order for a recursive function process properly, it has to have at least one base case and at least one general case.
       A general case is where we call the function itself using recursion. Sometimes, we may need to update the input for general cases every time it recurs according to the function requirement. A recursive function may need to call itself a lot of time in order to achieve the optimal goal, but it surely would make the problem a lot easier than solving the problem as a whole. On the other hand, a base case is a restriction to this recursive function. It is defined to stop recursion when it is necessary. Otherwise, the function will call itself until our computer run out of memories or raise an error in some situations.
       An simple example for recursion would be nested_list_max, which finds and returns the maximum integer in a nested list:
                     def nested_list_max(L: list) -> int:
                            """Return the maximum in nested list L.
                            """
                            return max([nested_list_max(i) if isinstance(i, list) else i for i in L])
This function will form a list using list comprehension: it goes into every item in list L, if the that item is a integer, it will add it into the list; if the item is a list, the function will call itself again on that item. After the function goes through every item in list L, it will find the maximum number in the list that made by list comprehension. No matter what is the depth of the list L, the function will always go thought every element in it.
       In the above problem, it looks pretty simple. But if we want to code it without recursion, it would be really hard, since we need to consider the depth of the input list L. Once again, it is really important to have a base case and a general case. In the example, my base case is that if item i is not a list, the list comprehension would add i into the new list. And my general case is that if i is a list, the list comprehension would add the result of nested_list_max into the new list.

      Recursion can do a lot more than the example I showed, and there is still a lot of things I need to learn about it. Once I can ace recursion, I believe coding would be a lot easier in the future.

Thursday, February 13, 2014

Week 6: Recursion Structure - Tree

       After learning how to code recursion, we learned about tree, a recursion structure, this week. It took me some time to get used to the traversals: pre-order, in-order, and post-order. It become much easier once I understood the recursion within them. For pre-order, we just need to visit root first, then pre-order (recursion) left subtree, then pre-order (recursion) right subtree. So, if left/right subtree is a leaf, then we can just return that node. If the subtree is a internal node, then we treat the subtree as a new tree, and analysis it again. In-order and post-order is very similar to pre-order. The difference is that they visit root, left-subtree, and right-subtree in different orders.

       After that, we start to learn about the coding for those traversals, but it was kind of confusing. It was quiet amazing for the code to be so short for such complicated tasks. I thought I got the code after some discussion with classmates. In this week's lab, there is a similar problem. I tried solve it in one line, but the only thing I got is a headache. On the other hand, my partner combined the use of list comprehension and ternary, and coded it in one line. That was exactly what professor told us during lecture. But I did not know how to use them properly. Hence, my plan for the reading week is to ace recursion, list comprehension, and ternary, as well as getting ready for our mid-term test.

Sunday, February 9, 2014

Week 5 Learning Recursion

       Continue from last week's lectures, recursion is still the top topic in this week. I did more practice on tracing recursion, and learned how to code recursions. When professor talked about tracing during lectures, he also told us how to be lazy. Never trace something we know what its output is. It will save us plenty of time.
       In lectures, an example about coding for recursion is the Tower of Hanoi. It was quite easy to code it, because I knew how to play the game, and professor gave the step-to-step instruction on it. Besides, professor finished the base case for this recursion. It makes the lecture seems really easy.
       However, I got a lot problems when I wrote the code for the recursion in assignment one, since nobody gave me the base case or general case or step-to-step instructions about it this time. I failed couple times, because I forgot to write code for the base case. So my recursion just kept repeating until I found this mistake out.

       I think I would still take a long time before I can ace recursion in test.