- Aliases
- and operator
- Booleans
- Classes
- Code blocks
- Comments
- Conditional statements
- Console
- Data structures
- datetime module
- Decorator
- Dictionaries
- Docstrings
- enum
- enumerate() function
- Equality operator
- Exception handling
- False
- File handling
- Floats
- For loops
- Formatted strings
- Functions
- Generator
- Greater than operator
- Greater than or equal to operator
- If statement
- in operator
- Indices
- Inequality operator
- Integers
- Iterator
- Lambda function
- Less than operator
- Less than or equal to operator
- List append() method
- List comprehension
- List insert() method
- List pop() method
- List sort() method
- Lists
- Logging
- map() function
- Match statement
- Math module
- Modules
- Multiprocessing
- Multithreading
- None
- not operator
- OOP
- or operator
- Parameters
- print() function
- Random module
- range() function
- Recursion
- Regular expressions
- requests Library
- return statement
- round() function
- Sets
- SQLite
- String join() method
- String replace() method
- String split() method
- Strings
- time.sleep() function
- True
- try...except statement
- Tuples
- Variables
- While loops
- Zip function
PYTHON
Python Multiprocessing: Syntax, Usage, and Examples
Python multiprocessing allows you to run multiple processes in parallel, leveraging multiple CPU cores for improved performance. Unlike multithreading, which is limited by Python’s Global Interpreter Lock (GIL), multiprocessing lets you execute CPU-bound tasks concurrently. The multiprocessing module provides tools to create, manage, and synchronize processes, making it ideal for heavy computations, parallel data processing, and task automation.
How to Use Python Multiprocessing
The multiprocessing module provides the Process
class, which you can use to create and run independent processes. Here’s how you can create a new process:
import multiprocessing
def worker_function():
print("Process running")
# Creating a process
process = multiprocessing.Process(target=worker_function)
# Starting the process
process.start()
# Waiting for process to finish
process.join()
multiprocessing.Process(target=function_name)
: Creates a new process that runs the specified function..start()
: Starts the process execution..join()
: Ensures the main process waits for the child process to finish.
When to Use Multiprocessing in Python
Multiprocessing is ideal when your code needs to handle CPU-intensive tasks. Here are three common situations where multiprocessing improves performance:
Parallelizing CPU-Intensive Tasks
When processing large datasets, using multiple CPU cores can significantly speed up execution.
import multiprocessing
def calculate_square(n):
return n * n
numbers = [1, 2, 3, 4, 5]
with multiprocessing.Pool(processes=3) as pool:
results = pool.map(calculate_square, numbers)
print(results) # Output: [1, 4, 9, 16, 25]
Handling Multiple Processes in Web Scraping
Web scraping involves sending multiple requests, which can be parallelized to speed up data retrieval.
import multiprocessing
import requests
def fetch_url(url):
response = requests.get(url)
return f"{url}: {response.status_code}"
urls = ["https://example.com", "https://google.com", "https://github.com"]
with multiprocessing.Pool(processes=3) as pool:
results = pool.map(fetch_url, urls)
print(results)
Efficiently Processing Large Files
If you need to process large files, multiprocessing lets you split the work across multiple CPU cores.
import multiprocessing
def count_lines(file_name):
with open(file_name, 'r') as file:
return len(file.readlines())
files = ["file1.txt", "file2.txt", "file3.txt"]
with multiprocessing.Pool() as pool:
line_counts = pool.map(count_lines, files)
print(line_counts)
Examples of Python Multiprocessing
Running Multiple Processes Concurrently
import multiprocessing
def worker(task_id):
print(f"Task {task_id} is running")
tasks = [multiprocessing.Process(target=worker, args=(i,)) for i in range(3)]
for task in tasks:
task.start()
for task in tasks:
task.join()
Using a Shared Memory Variable
When multiple processes need to share data, you can use Value
or Array
from the multiprocessing module.
import multiprocessing
def increment(shared_counter):
for _ in range(100000):
with shared_counter.get_lock():
shared_counter.value += 1
counter = multiprocessing.Value('i', 0) # Shared integer value
processes = [multiprocessing.Process(target=increment, args=(counter,)) for _ in range(2)]
for p in processes:
p.start()
for p in processes:
p.join()
print(f"Final counter value: {counter.value}")
Using a Process Pool for Task Distribution
If you have many small tasks, a process pool distributes the workload efficiently.
import multiprocessing
def square(n):
return n * n
numbers = [1, 2, 3, 4, 5]
with multiprocessing.Pool(processes=2) as pool:
results = pool.map(square, numbers)
print(results) # Output: [1, 4, 9, 16, 25]
Learn More About Python Multiprocessing
Python ThreadPool vs. Multiprocessing
ThreadPool is best for I/O-bound tasks like web scraping or file I/O, while multiprocessing excels in CPU-bound tasks like numerical computations or image processing. The choice depends on whether your bottleneck is CPU or waiting time.
Implementing a Rate Limit for Python Multiprocessing Requests
When making multiple HTTP requests, you can prevent overloading a server by implementing a rate limit.
import multiprocessing
import time
def fetch_data(url):
time.sleep(1) # Simulating rate limiting
print(f"Fetched data from {url}")
urls = ["https://example.com", "https://api.github.com", "https://google.com"]
with multiprocessing.Pool(processes=2) as pool:
pool.map(fetch_data, urls)
Managing Shared Memory in Python Multiprocessing
Multiprocessing allows processes to share memory using multiprocessing.Manager()
.
import multiprocessing
def add_to_list(shared_list, value):
shared_list.append(value)
with multiprocessing.Manager() as manager:
shared_list = manager.list()
processes = [multiprocessing.Process(target=add_to_list, args=(shared_list, i)) for i in range(5)]
for p in processes:
p.start()
for p in processes:
p.join()
print(list(shared_list)) # Output: [0, 1, 2, 3, 4]
Python Multiprocessing Contexts and Fork Issues
Python provides three multiprocessing contexts:
fork
(Unix only, duplicates memory)spawn
(creates a new Python interpreter)forkserver
(starts a new process from a clean state)
If you run into segmentation faults due to fork
, use spawn
instead:
import multiprocessing
if __name__ == "__main__":
multiprocessing.set_start_method("spawn")
process = multiprocessing.Process(target=print, args=("Hello from a spawned process!",))
process.start()
process.join()
Preventing Deadlocks with Python Multiprocessing Lock
When multiple processes modify shared resources, use multiprocessing.Lock()
to prevent conflicts.
import multiprocessing
lock = multiprocessing.Lock()
def task(name):
with lock:
print(f"{name} is running")
processes = [multiprocessing.Process(target=task, args=(f"Process {i}",)) for i in range(3)]
for p in processes:
p.start()
for p in processes:
p.join()
Using Multiprocessing Logging
Logging across multiple processes can be tricky, but multiprocessing.Queue()
helps.
import multiprocessing
import logging
def worker(log_queue):
logger = logging.getLogger()
handler = logging.handlers.QueueHandler(log_queue)
logger.addHandler(handler)
logger.setLevel(logging.INFO)
logger.info("Logging from process")
log_queue = multiprocessing.Queue()
process = multiprocessing.Process(target=worker, args=(log_queue,))
process.start()
process.join()
while not log_queue.empty():
print(log_queue.get())
When to Avoid Multiprocessing
Multiprocessing has some downsides:
- High Overhead: Creating and managing processes requires more memory than threads.
- Not Always Faster: If a task involves a lot of data transfer between processes, the overhead can slow things down.
- Not Suitable for Short Tasks: The time taken to start and terminate processes can outweigh the benefits.
If your program is mostly I/O-bound or requires frequent inter-process communication, consider using multithreading instead.
Python multiprocessing is a powerful tool for optimizing performance when working with CPU-bound tasks. By leveraging multiple CPU cores, you can significantly speed up data processing, image transformations, and complex calculations.
However, multiprocessing comes with trade-offs, including higher memory usage and potential synchronization challenges.
Sign up or download Mimo from the App Store or Google Play to enhance your programming skills and prepare for a career in tech.