From Module 1: Parallel computing (using threads, MPI, GPU computing)
A process can be viewed as a running program usually in an isolated environment (with its own memory) and managed by the operating system. With the evolution of processes and operating systems schedulers, it is possible to run several processes in parallel on a single machine or in a distributed environment on multiple computers: that is called multiprocessing and we will discuss more it in the next chapter.
A thread is nothing more than a running subprogram. A process can launch many threads that run in parallel (That is called multithreading). As threads belong to a single processor, they can share memory. In contrast to processes, threads are managed (created, killed, …) completely by the program that created them. Those specificities give threads multiples advantages:
Hyper-threading is a process by which a CPU divides up its physical cores into virtual cores that are treated as if they are actually physical cores by the operating system.
A thread has a starting phase, an execution sequence and a conclusion. it can be interrupted or stopped for a certain amount of time. Each thread that we create can have its own role and execution process on the computer. So it is a really good way to use the resources of a machine that we have. It may be to train a simple model with limited data or to execute whatever we want
Each programming language has its own way to deal with threads. Although The java approach is not very far from the python one, they differ in some ways. And for this reason, this section will become more technical. We will therefore choose python for the continuation and in the whole book. But I have to stress that python is very bad at multithreading because of the Global Interpreter Lock (GIL) that makes threads run almost sequentially. You may want to read more about it here
Some basic operations applicable to a thread are:
In python to create a thread, we can use the _thread module. There is a method called start_new_thread that is used for this purpose. His definition is _thread.start_new_thread ( function, args[, kwargs] ), Where function is the function we want our thread to execute and the rest of the parameters are the arguments we want to pass to our function. This module is more low-level and a more complicated way to deal with threads
This is a more modern way to create a thread and it is really simple. It provides some useful methods to manage threads like:
In order to launch a thread using the threading method, we must:
#!/usr/bin/python import threading import time class myThread (threading.Thread): def __init__(self, message): threading.Thread.__init__(self) self.message = message def run(self): while(True): print(self.message) time.sleep(1) # Create new threads thread1 = myThread("Thread-1") thread2 = myThread("Thread-2") thread3 = myThread("Thread-3") thread4 = myThread("Thread-4") # Start new Threads thread1.start() thread2.start() thread3.start() thread4.start()
The execution of this program gives this output:
Thread-1 Thread-3 Thread-4 Thread-2 Thread-1 Thread-3 Thread-4 Thread-2 Thread-1 Thread-4 Thread-3 Thread-2 Thread-1 Thread-4 Thread-3
We can see that the order of execution of those threads is not respected. And that is because of the operating system task scheduler. (see thread execution graph)
After creating a thread, we can kill it. Yeah, it is. Killing a thread is actually stopping his execution as a subprogram. This is done in different ways :
As we saw in the last example, when different threads are started, they run in parallel in no predefined other or synchronization. It may lead to many problems like many threads trying to access the same piece of data which can lead to problems like making data inconsistent or getting garbled output (like having Hlleo instead of Hello on your console). Such problems can arise when we don’t tell the computer how to Manage threads in an organized manner. The synchronization concept is very important in threads. There are many cases where we need it like:
When a thread needs to wait until another thread terminates
If we want all the threads to terminate before we continue with our program, we must synchronize them using the join method we discussed earlier. To do so we must add this to our last code
threads =  # Add threads to thread list threads.append(thread1) threads.append(thread2) threads.append(thread3) threads.append(thread4) # Wait for all threads to complete for t in threads: t.join() # Continue here
That is interesting in a case where we want to use the result of the threads for some part of our program.
When working with threads, we may encounter pretty weird behaviours. And get results that we don’t expect due to a concept called race conditions. Let’s assume there are 20 threads working in parallel on a task (executing the same function). And those threads have to read and update the same variable at a point in time.
At a point in time, 2 threads will access and try to update the shared variable at the same time. The problem occurs when those 2 threads have read the value of the variable before, and then they have 2 local values. If one thread updates the variable, the others will not consider the new value and will overwrite it.
And it may end with a program that gives 85 for example instead of 100. And this is a huge problem. For example, for a banking app, people may end up getting less or more than they are supposed to get.
This may happen for a single line of code too. Because the low-level code of a single instruction may be more than 10 lines. And then the processors may read a few lines and then switch contexts.
For example, this line:
money = money + val
will require 6 lines in low-level instructions. And it may be subject to race conditions.
Let’s take an example to illustrate that:
#!/usr/bin/python from threading import Thread import time money = 0 def updateMoney(): global money value = money time.sleep(.001) money = value + 10 # create threads threads =  for i in range(20): threads.append(Thread(target=updateMoney, args=())) # start the threads for i in range(20): threads[i].start() # join the threads for i in range(20): threads[i].join() print('The final value of money is ', money)
The output is:
As we can see, our program is unpredictable. And we can solve this problem really easily with python Locks (The same principle exists for other languages)
Python Locks are built to help prevent this kind of behavior. In this case, we must lock the variable, update it, and then release the lock.
To do it must update our code to reflect the integration of Locks
#!/usr/bin/python from threading import Thread, Lock import time money = 0 lock = Lock() def updateMoney(): global money lock.acquire() value = money time.sleep(.001) money = value + 10 lock.release() # the rest of program
And then our result will always be 200
Locks are important, but they may cause other problems. We know now that we may lock a part of our code and another thread will wait for the actual thread to release the lock before accessing the variable. It may occur that two threads lock two different variables and then need the variable that the other holds. It is called DeadLock
And then they just blocked there, waiting for each other.
An example can be this case:
#!/usr/bin/python from threading import Thread, Lock import time lock1 = Lock() lock2 = Lock() def updateMoney1(): lock1.acquire() print('John block the account of Jane') time.sleep(1) lock2.acquire() print("John send money to Jane") lock2.release() lock1.release() def updateMoney2(): lock2.acquire() print('Jane block the account of John') time.sleep(1) lock1.acquire() print("Jane send money to John") lock1.release() lock2.release() # create threads threads =  threads.append(Thread(target=updateMoney1, args=())) threads.append(Thread(target=updateMoney2, args=())) # start the threads for i in range(2): threads[i].start() # join the threads for i in range(2): threads[i].join()
The program print :
And then freezes. nothing work again
There is no general solution to deadlocks. So it hugely depends on how you write your code. To avoid deadlock try to write a code that avoids blocking 2 threads and a better way to handle this kind of solution is to execute the blocked code one after another. And a good practice is always to wrap the block instructions in a try, catch block. Because if an exception is thrown and your thread stops before releasing a lock it may block the others.
try: lock.acquire() assert(False) except: print('ouf, we had a catch block') lock.release()
A thread pool is like the name says it, a pool for thread operations. It is mostly used in case we have to create a huge amount of threads (Big data context for example). The global management of those threads can be difficult manually. a huge issue could be in the throughput getting limited too. To solve those problems, we can use a Thread pool. It is mainly used to manage a large number of threads,
Let directly see an example of thread pools:
from concurrent.futures import ThreadPoolExecutor # Large amount of data numbers = [2,3,4,5,6] def double(n): return n * 2 def main(): with ThreadPoolExecutor(max_workers = 3) as executor: results = executor.map(double, numbers) for result in results: print(result) main()
This program will print the inspected result, but it is not sequentially executed. We have specified 3 max threads to work on the task. It can be very helpful in cases like the previous one when we need some threads to be executed on a large amount of data and we get the result.
To use it, we must first manually split the data to be computed by each thread before passing it as a parameter to the thread pool.
On the other hand, dependency graphs are computations composed of several atomic parts/tasks. And some tasks require the result of others as input.
Let’s imagine a scenario, where we have different data sources and on each data source we must apply a clean function that is not common to every data source: It may end ups with dependencies graphs like this one:
f1 -> f2,f3,f4,f5
f2 -> f3, f5
To use thread in this context, we must use dependency graphs.
There is no standard way to deal with this, but the most common solution is to use asynchronous programming. And in this case, we can make a promise, that the dependent function will wait until they get the result from their parent, and then continue the computation with the result they get. The python asyncio package is a great solution to work asynchronously in python.
After hard work, we can introduce a master standard that is built to completely facilitate the integration of threading in your code. It has many implementations in different languages. Actually in python due to the restrictions on threads with GIL, there is no standard way of dealing with it. you may need to check out the java version of OpenMP or some libraries like cython, Pythran, …
Despite the fact python has many restrictions with Threads, it is always possible to parallelize our code using the thread approach. It is helpful in a data analytics environment setup to use the full computational capacities of our devices… But most of the time, the computation is just too high and the amount of data is too big to be handled by a single computer. In this case, we must use a network-based environment and connect multiple computers, using the computation power of each. This introduces some concepts like MPI that we will discuss in the upcoming chapters
We have the knowledge and the infrastructure to build, deploy and monitor Ai solutions for any of your needs.Contact us