Okay, here’s a comprehensive article on iterating over rows in Pandas, aiming for approximately 5000 words and covering a wide range of techniques and considerations:

How to Iterate Over Rows in Pandas (with Examples)

Pandas, a cornerstone of data analysis in Python, provides powerful and efficient ways to manipulate tabular data. While Pandas is designed for vectorized operations (operating on entire columns or DataFrames at once), there are situations where you need to process data row by row. This article provides a deep dive into the various methods for iterating over rows in a Pandas DataFrame, discussing their pros and cons, performance implications, and best-practice recommendations.

1. Introduction: The Need for Row Iteration (and When to Avoid It)

Before we jump into the “how,” it’s crucial to understand the “why” and, more importantly, the “when not to.” Pandas excels at vectorized operations. These operations leverage optimized, low-level implementations (often in C or Fortran) to perform calculations on entire arrays of data simultaneously. This is drastically faster than processing data element by element.

When to AVOID Row Iteration:

Simple arithmetic or logical operations on columns: If you can express your operation using built-in Pandas functions or operators (e.g., adding two columns, filtering rows based on a condition, calculating a new column based on existing ones), always do so. This will be orders of magnitude faster than iterating.
Applying the same function to every row (without row-specific logic): Pandas’ apply() function (discussed later) is generally much better suited for this than explicit row iteration.
Aggregations and calculations that can be done with groupby(): If you need to perform calculations within groups of data, groupby() is almost always the right tool.

When Row Iteration MIGHT Be Necessary:

Complex, row-specific logic: If the operation you need to perform on each row depends on the specific values within that row in a way that cannot be easily vectorized, row iteration might be unavoidable. This often involves conditional statements that branch based on multiple column values within a single row.
Interacting with external systems (row by row): If you need to send data from each row to an external API, database, or file, one row at a time, iteration is often necessary.
Building up a result row by row (with dependencies between rows): If the calculation for row n depends on the results calculated for row n-1, you’ll likely need iteration. This is less common in typical data analysis but can occur in time series analysis or simulations.
Debugging and understanding data: Sometimes, iterating through a small sample of rows can be helpful for inspecting data and understanding how your code is working, especially during development.

Key Takeaway: Always strive to use vectorized operations whenever possible. Row iteration should be a last resort due to its significant performance impact, especially on large DataFrames.

2. Iteration Methods: A Detailed Overview

We’ll now explore the various methods for iterating over rows in Pandas, starting with the most commonly (and often incorrectly) used methods and moving towards more efficient and recommended approaches.

2.1 iterrows()

DataFrame.iterrows() is a generator that yields both the index and the row data (as a Pandas Series) for each row in the DataFrame. It’s often the first method people encounter when learning Pandas, but it’s generally not the most efficient choice.

“`python
import pandas as pd

data = {‘col1’: [1, 2, 3], ‘col2’: [‘A’, ‘B’, ‘C’], ‘col3’: [4.5, 6.7, 8.9]}
df = pd.DataFrame(data)

for index, row in df.iterrows():
print(f”Index: {index}”)
print(f”Row data:\n{row}”)
print(f”Value in col1: {row[‘col1’]}”) # Accessing a specific column
print(“—“)
“`

Output:

“`
Index: 0
Row data:
col1 1.0
col2 A
col3 4.5
Name: 0, dtype: object
Value in col1: 1.0

Index: 1
Row data:
col1 2.0
col2 B
col3 6.7
Name: 1, dtype: object
Value in col1: 2.0

Index: 2
Row data:
col1 3.0
col2 C
col3 8.9
Name: 2, dtype: object
Value in col1: 3.0

“`

Pros of iterrows():

Easy to understand and use: The syntax is straightforward, making it accessible for beginners.
Access to both index and row data: You get both pieces of information in each iteration.

Cons of iterrows():

Slow: iterrows() is generally very slow, especially for large DataFrames. It creates a new Series object for each row, which involves significant overhead.
Data type inconsistencies: The Series objects returned by iterrows() might not preserve the original data types of the DataFrame columns, especially if the DataFrame has mixed data types. This can lead to unexpected behavior. For instance, an integer column might be converted to a float if there’s a float column in the same DataFrame.
Modifying the DataFrame during iteration is NOT recommended: The row Series returned are views of the original DataFrame, but modifications may not always be reflected back in the original DataFrame.

Example demonstrating data type issues with iterrows():

“`python
import pandas as pd
import numpy as np

data = {‘col1’: [1, 2, 3], ‘col2’: [‘A’, ‘B’, ‘C’], ‘col3’: [4.5, 6.7, np.nan]}
df = pd.DataFrame(data)

for index, row in df.iterrows():
print(f”Type of col1 in row: {type(row[‘col1’])}”)
print(f”Type of col3 in row: {type(row[‘col3’])}”)

print(f”\nOriginal DataFrame dtypes:\n{df.dtypes}”)

“`

Output:

“`
Type of col1 in row:
Type of col3 in row:
Type of col1 in row:
Type of col3 in row:
Type of col1 in row:
Type of col3 in row:

Original DataFrame dtypes:
col1 int64
col2 object
col3 float64
dtype: object
“`

Notice that col1, which was originally an int64 in the DataFrame, becomes a float within the iterrows() loop. This is because the Series needs to accommodate the float64 in col3.

2.2 itertuples()

DataFrame.itertuples() is another generator that yields rows as namedtuples. This is generally faster than iterrows() because namedtuples are more lightweight than Series objects.

“`python
import pandas as pd

data = {‘col1’: [1, 2, 3], ‘col2’: [‘A’, ‘B’, ‘C’], ‘col3’: [4.5, 6.7, 8.9]}
df = pd.DataFrame(data)

for row in df.itertuples():
print(row)
print(f”Value in col1: {row.col1}”) # Accessing by attribute
print(f”Value in col2: {row[1]}”) # Accessing by index
print(“—“)
“`

Output:

“`
Pandas(Index=0, col1=1, col2=’A’, col3=4.5)
Value in col1: 1
Value in col2: A

Pandas(Index=1, col1=2, col2=’B’, col3=6.7)
Value in col1: 2
Value in col2: B

Pandas(Index=2, col1=3, col2=’C’, col3=8.9)
Value in col1: 3
Value in col2: C

“`

Pros of itertuples():

Faster than iterrows(): Namedtuples are more efficient than Series.
Access by attribute or index: You can access column values using attribute notation (e.g., row.col1) or index notation (e.g., row[1]).
Preserves data types (better than iterrows()): itertuples() generally does a better job of preserving the original data types of the DataFrame.

Cons of itertuples():

Still slower than vectorized operations: While faster than iterrows(), it’s still significantly slower than using vectorized operations.
Column names must be valid Python identifiers: If your column names contain spaces or special characters, you’ll need to access them by index. You can control the naming using the name and index parameters.
Modifying the original DataFrame through itertuples is not recommended: just like iterrows, you should only use itertuples for read-only operation on the DataFrame.

Example with custom names and disabling the index:

“`python
import pandas as pd

data = {‘My Column 1’: [1, 2, 3], ‘Another Column’: [‘A’, ‘B’, ‘C’]}
df = pd.DataFrame(data)

Rename columns to be valid identifiers

for row in df.itertuples(index=False, name=’MyRow’):
print(row)
print(row[0]) #Access through index because of invalid column names.
“`
Output:

MyRow(_0=1, _1='A') 1 MyRow(_0=2, _1='B') 2 MyRow(_0=3, _1='C') 3
2.3 List Comprehensions (with to_dict)

You can convert the DataFrame to a list of dictionaries (using to_dict('records')) and then iterate over that list. This can be a reasonable approach for moderate-sized DataFrames, especially if you need to work with the data in a dictionary format.

“`python
import pandas as pd

data = {‘col1’: [1, 2, 3], ‘col2’: [‘A’, ‘B’, ‘C’], ‘col3’: [4.5, 6.7, 8.9]}
df = pd.DataFrame(data)

list_of_dicts = df.to_dict(‘records’)

for row_dict in list_of_dicts:
print(row_dict)
print(f”Value in col1: {row_dict[‘col1’]}”)
print(“—“)
“`

Output:

“`
{‘col1’: 1, ‘col2’: ‘A’, ‘col3’: 4.5}
Value in col1: 1

{‘col1’: 2, ‘col2’: ‘B’, ‘col3’: 6.7}
Value in col1: 2

{‘col1’: 3, ‘col2’: ‘C’, ‘col3’: 8.9}
Value in col1: 3

“`

Pros of using to_dict('records'):

Potentially faster than iterrows() and itertuples() in some cases: The performance can be comparable to or slightly better than itertuples().
Works well with dictionary-based operations: If you need to process the data as dictionaries, this is a natural fit.

Cons of using to_dict('records'):

Memory overhead: Converting the entire DataFrame to a list of dictionaries can consume significant memory, especially for large DataFrames.
Still slower than vectorized operations: This method is still an iterative approach and won’t be as fast as vectorized operations.
Loses index information by default: The index is not included in the dictionaries unless you explicitly add it.

Example with adding the index:

“`python
import pandas as pd

data = {‘col1’: [1, 2, 3], ‘col2’: [‘A’, ‘B’, ‘C’], ‘col3’: [4.5, 6.7, 8.9]}
df = pd.DataFrame(data)

list_of_dicts = df.reset_index().to_dict(‘records’) # Reset index to include it

for row_dict in list_of_dicts:
print(row_dict)
“`

Output:
{'index': 0, 'col1': 1, 'col2': 'A', 'col3': 4.5} {'index': 1, 'col1': 2, 'col2': 'B', 'col3': 6.7} {'index': 2, 'col1': 3, 'col2': 'C', 'col3': 8.9}

2.4 .apply() (Row-wise)

The apply() method is a powerful tool for applying a function along an axis of the DataFrame. While often used for column-wise operations, it can also be used for row-wise operations by setting axis=1.

“`python
import pandas as pd

data = {‘col1’: [1, 2, 3], ‘col2’: [‘A’, ‘B’, ‘C’], ‘col3’: [4.5, 6.7, 8.9]}
df = pd.DataFrame(data)

def process_row(row):
# Row is a Series object
result = row[‘col1’] * 2 if row[‘col2’] == ‘A’ else row[‘col1’]
return result

df[‘new_col’] = df.apply(process_row, axis=1)
print(df)
“`

Output:

col1 col2 col3 new_col 0 1 A 4.5 2 1 2 B 6.7 2 2 3 C 8.9 3

Pros of apply() (row-wise):

More concise than explicit loops: It can often express row-wise logic more compactly than using for loops.
Can be faster than iterrows() and itertuples(): apply() can sometimes leverage some internal optimizations, making it faster than the explicit iteration methods. However, this is not guaranteed, and the performance depends heavily on the function being applied.
Handles data types better than iterrows(): apply() generally preserves data types better.

Cons of apply() (row-wise):

Still not as fast as vectorized operations: apply() with axis=1 is essentially a disguised loop. It’s not a true vectorized operation. If your logic can be expressed using vectorized operations, that will always be faster.
Performance can vary greatly: The performance of apply() depends heavily on the complexity of the function being applied. Simple functions might be relatively fast, but complex functions (especially those involving Python function calls) can be quite slow.
Overhead of function calls: For each row, your function is called, which adds overhead.

2.5 Vectorized Operations (The Best Approach)

As emphasized throughout this article, vectorized operations are the cornerstone of efficient Pandas code. Whenever possible, you should strive to express your row-wise logic using vectorized operations. This often involves a combination of:

Column-wise arithmetic and logical operations: +, -, *, /, ==, <, >, etc.
Boolean indexing: Selecting rows based on conditions.
np.where() (or Pandas’ where()): Conditional assignment based on a condition.
Pandas built-in functions: sum(), mean(), max(), min(), str.contains(), etc.

Example: Rewriting the apply() example with vectorization:

“`python
import pandas as pd
import numpy as np

data = {‘col1’: [1, 2, 3], ‘col2’: [‘A’, ‘B’, ‘C’], ‘col3’: [4.5, 6.7, 8.9]}
df = pd.DataFrame(data)

Vectorized solution

df[‘new_col’] = np.where(df[‘col2’] == ‘A’, df[‘col1’] * 2, df[‘col1’])
print(df)
“`

Output:

col1 col2 col3 new_col 0 1 A 4.5 2 1 2 B 6.7 2 2 3 C 8.9 3

This vectorized solution is significantly faster than the apply() solution because it avoids the overhead of Python function calls and leverages NumPy’s optimized array operations.

Example: More complex logic with vectorization:

Let’s say we want to create a new column ‘status’ based on the following rules:

If col1 is greater than 2 and col2 is ‘B’, set ‘status’ to ‘High’.
If col1 is less than or equal to 2 and col3 is greater than 5, set ‘status’ to ‘Medium’.
Otherwise, set ‘status’ to ‘Low’.

“`python
import pandas as pd
import numpy as np

data = {‘col1’: [1, 3, 2, 4, 1], ‘col2’: [‘A’, ‘B’, ‘C’, ‘B’, ‘A’], ‘col3’: [4.5, 6.7, 8.9, 2.1, 5.5]}
df = pd.DataFrame(data)

df[‘status’] = ‘Low’ # Default value

Condition 1

condition1 = (df[‘col1’] > 2) & (df[‘col2’] == ‘B’)
df.loc[condition1, ‘status’] = ‘High’

Condition 2

condition2 = (df[‘col1’] <= 2) & (df[‘col3’] > 5)
df.loc[condition2, ‘status’] = ‘Medium’

print(df)
“`

Output:

col1 col2 col3 status 0 1 A 4.5 Low 1 3 B 6.7 High 2 2 C 8.9 Medium 3 4 B 2.1 High 4 1 A 5.5 Medium

This example demonstrates how to combine boolean indexing and .loc for efficient conditional assignment. This approach is much faster than iterating through the rows and applying the same logic.

3. Performance Comparison and Benchmarking

To illustrate the performance differences between the various methods, let’s perform a simple benchmark. We’ll create a large DataFrame and measure the time it takes to perform a simple calculation on each row using each method.

“`python
import pandas as pd
import numpy as np
import time

Create a large DataFrame

num_rows = 100000
data = {‘col1’: np.random.randint(1, 10, num_rows),
‘col2’: np.random.choice([‘A’, ‘B’, ‘C’], num_rows),
‘col3’: np.random.rand(num_rows)}
df = pd.DataFrame(data)

— iterrows() —

start_time = time.time()
result_iterrows = []
for index, row in df.iterrows():
result_iterrows.append(row[‘col1’] * 2 if row[‘col2’] == ‘A’ else row[‘col1’])
end_time = time.time()
iterrows_time = end_time – start_time

— itertuples() —

start_time = time.time()
result_itertuples = []
for row in df.itertuples():
result_itertuples.append(row.col1 * 2 if row.col2 == ‘A’ else row.col1)
end_time = time.time()
itertuples_time = end_time – start_time

— to_dict(‘records’) —

start_time = time.time()
list_of_dicts = df.to_dict(‘records’)
result_todict = []
for row_dict in list_of_dicts:
result_todict.append(row_dict[‘col1’] * 2 if row_dict[‘col2’] == ‘A’ else row_dict[‘col1’])
end_time = time.time()
todict_time = end_time – start_time

— apply() —

start_time = time.time()
def process_row(row):
return row[‘col1’] * 2 if row[‘col2’] == ‘A’ else row[‘col1’]
result_apply = df.apply(process_row, axis=1)
end_time = time.time()
apply_time = end_time – start_time

— Vectorized —

start_time = time.time()
result_vectorized = np.where(df[‘col2’] == ‘A’, df[‘col1’] * 2, df[‘col1’])
end_time = time.time()
vectorized_time = end_time – start_time

— Print Results —

print(f”iterrows() time: {iterrows_time:.4f} seconds”)
print(f”itertuples() time: {itertuples_time:.4f} seconds”)
print(f”to_dict() time: {todict_time:.4f} seconds”)
print(f”apply() time: {apply_time:.4f} seconds”)
print(f”Vectorized time: {vectorized_time:.4f} seconds”)

Verify that all results are the same (optional, but good practice)

print(np.array_equal(result_iterrows, result_vectorized))

print(np.array_equal(result_itertuples, result_vectorized))

print(np.array_equal(result_todict, result_vectorized))

print(np.array_equal(result_apply.to_numpy(), result_vectorized))

“`

Typical Output (times will vary depending on your hardware):

iterrows() time: 7.5534 seconds itertuples() time: 0.6462 seconds to_dict() time: 0.9599 seconds apply() time: 0.9603 seconds Vectorized time: 0.0029 seconds

The results clearly demonstrate the dramatic performance advantage of vectorized operations. iterrows() is by far the slowest, followed by apply(), to_dict('records') and itertuples(). The vectorized approach is orders of magnitude faster than any of the iterative methods. This difference becomes even more pronounced as the DataFrame size increases.

4. Modifying DataFrames During Iteration (and Why You Shouldn’t)

As mentioned earlier, modifying a DataFrame while iterating over it using iterrows() or itertuples() is generally not recommended and can lead to unpredictable results. The row objects returned by these methods are often views or copies of the original data, and changes made to them may not be reflected back in the DataFrame.
If you really need to change the DataFrame during iteration, you should do so using methods which guarantee modifications will work, such as creating new list, and replacing a column afterwards.

Example (Illustrating the Problem):

“`python
import pandas as pd

data = {‘col1’: [1, 2, 3], ‘col2’: [‘A’, ‘B’, ‘C’]}
df = pd.DataFrame(data)

Attempting to modify col1 using iterrows()

for index, row in df.iterrows():
row[‘col1’] = row[‘col1’] * 2 # This might NOT work as expected!

print(df) #col1 remains unchanged.

Correct way with iterrows:

new_col1_values = []
for index, row in df.iterrows():
new_col1_values.append(row[‘col1’] * 2)

df[‘col1’] = new_col1_values
print(df)
“`

Output
col1 col2 0 1 A 1 2 B 2 3 C col1 col2 0 2 A 1 4 B 2 6 C
As the first print(df) shows, changes made to row within the loop do not modify the original df.
The correct way, if iterrows must be used, is to create a new list holding the changed values, then assigning it to the DataFrame after.

5. Advanced Techniques and Considerations

5.1 Using numba to Speed Up Iteration

If you absolutely must iterate and your row-wise logic is computationally intensive, you can sometimes get a significant speed boost by using the numba library. numba is a just-in-time (JIT) compiler that can translate Python code (including loops) into highly optimized machine code.

“`python
import pandas as pd
import numpy as np
from numba import jit

Create a large DataFrame

Define the row-wise function with @jit

@jit(nopython=True)
def process_row_numba(col1_values, col2_values):
result = np.empty_like(col1_values) #Pre-allocate the result to avoid creating a list.
for i in range(len(col1_values)):
result[i] = col1_values[i] * 2 if col2_values[i] == ‘A’ else col1_values[i]
return result

Extract values as NumPy arrays (important for numba)

col1_values = df[‘col1’].values
col2_values = df[‘col2’].values

Call the numba-compiled function

start_time = time.time()
result_numba = process_row_numba(col1_values, col2_values)
end_time = time.time()
numba_time = end_time-start_time
print(f’Numba: {numba_time}’)

Apply the result (same result as earlier).

df[‘new_col_numba’] = result_numba
print(df.head())
“`

Key points about using numba:

@jit(nopython=True): The nopython=True argument is crucial. It forces numba to compile the function without using the Python interpreter, which is necessary for significant speedups. If numba cannot compile in nopython mode, it will fall back to object mode (which is usually slower than regular Python).
Work with NumPy arrays: numba works best with NumPy arrays. Extract the relevant columns as NumPy arrays before passing them to the numba-compiled function.
Pre-allocate the result: Create an empty array using np.empty_like before the loop, and fill in values inside the loop, avoiding slow list appends.
Limited Python features: numba doesn’t support all Python features. You might need to rewrite your code to use only the features supported by numba in nopython mode. See the numba documentation for details.
Type Specificity You can gain further performance improvements by specifying data types to numba, making the compilation even more efficient.

5.2 Iterating over Chunks of a DataFrame

For extremely large DataFrames that don’t fit in memory, you can iterate over the DataFrame in chunks using the chunksize parameter of pd.read_csv() (or other file-reading functions).

“`python
import pandas as pd

Assuming you have a large CSV file called ‘large_data.csv’

chunksize = 10000 # Process 10,000 rows at a time

for chunk in pd.read_csv(‘large_data.csv’, chunksize=chunksize):
# Process each chunk (which is a DataFrame)
# You can use any of the methods discussed above (vectorized is preferred!)
# on the ‘chunk’ DataFrame.

# Example: Apply a vectorized operation to the chunk
chunk['new_col'] = np.where(chunk['col2'] == 'A', chunk['col1'] * 2, chunk['col1'])

# Do something with the processed chunk (e.g., save it to a file,
# append it to a list, send it to a database, etc.)
print(chunk.head())

“`

This approach allows you to process massive datasets that wouldn’t otherwise fit in memory. You can combine chunking with any of the iteration methods discussed earlier, but remember that vectorized operations are still the most efficient way to process data within each chunk.

5.3 Using .values (Carefully)

The .values attribute of a DataFrame returns a NumPy array representation of the DataFrame’s data. You could iterate over the rows of this NumPy array directly. However, this is generally not recommended unless you are very careful about data types and are sure you understand the implications.

“`python
import pandas as pd
import numpy as np

data = {‘col1’: [1, 2, 3], ‘col2’: [‘A’, ‘B’, ‘C’]}
df = pd.DataFrame(data)

for row in df.values:
print(row)
print(type(row))
“`

Output:
[1 'A'] <class 'numpy.ndarray'> [2 'B'] <class 'numpy.ndarray'> [3 'C'] <class 'numpy.ndarray'>

Pros:
* Potentially Fast: Accessing raw NumPy arrays can sometimes be faster.
Cons:
* Loss of Column Names and Index: You only get the raw data; you lose the column names and index information.
* Data Type Coercion: If your DataFrame has mixed data types, the resulting NumPy array will have a single data type (usually object), which can lead to unexpected behavior and performance issues. This is similar to the data type issues with iterrows().
* Less Readable: Your code becomes less readable because you’re working with raw arrays instead of named columns.

Only use .values for iteration if:

You really need the absolute maximum performance and have profiled your code to confirm that this is a bottleneck.
Your DataFrame has a single, consistent data type.
You don’t need the column names or index information.
You understand the potential pitfalls of data type coercion.

In most cases, itertuples() or vectorized operations are better choices.

6. Conclusion and Best Practices

Iterating over rows in a Pandas DataFrame should be approached with caution. Vectorized operations are almost always the preferred approach due to their superior performance. However, when row-wise logic is unavoidable, several methods are available:

iterrows(): Easy to use but slow and has potential data type issues. Avoid if possible.
itertuples(): Faster than iterrows() and preserves data types better. A reasonable choice if iteration is necessary.
to_dict('records'): Useful if you need to work with dictionaries. Moderate performance and memory overhead.
apply() (with axis=1): Can be more concise than explicit loops but is still essentially a loop and not as fast as vectorization.
Vectorized operations: The best approach. Use column-wise operations, boolean indexing, np.where(), and Pandas built-in functions whenever possible.
numba: For computationally heavy processes which must be iterated, consider numba to get a near-C level of performance.
Chunking: For very large DataFrames, process them in chunks using pd.read_csv(..., chunksize=...).
.values (rarely): Only use for direct array iteration if you understand the implications and have profiled your code.

Best Practices Summary:

Prioritize Vectorization: Always try to express your logic using vectorized operations first.
Profile Your Code: If you’re unsure which method is fastest, use profiling tools (like the timeit module or a more advanced profiler) to measure the performance of different approaches.
Avoid iterrows(): It’s rarely the best choice.
Consider itertuples() or to_dict('records'): If you must iterate, these are generally better than iterrows().
Use apply() with Caution: It can be convenient, but remember it’s not a true vectorized operation.
Explore numba for Performance-Critical Loops: If you must iterate and performance is critical, consider using numba to speed up your code.
Process Large DataFrames in Chunks: Use chunksize to handle datasets that don’t fit in memory.
Understand Data Types: Be aware of how different iteration methods handle data types, especially with mixed-type DataFrames.
Don’t Modify DataFrames During Iteration (with iterrows or itertuples): It’s generally unsafe and can lead to unexpected results. Instead, generate a new array or DataFrame.
Readability Matters: While performance is critical, prioritize code readability. Sometimes a slightly less performant, but vastly more readable solution, is preferable.

By following these guidelines, you can write efficient and effective Pandas code that leverages the power of vectorization while still handling situations that require row-wise processing. Remember to always prioritize vectorized operations and use iteration methods judiciously, carefully considering their performance implications.

Rename columns to be valid identifiers

Vectorized solution

Condition 1

Condition 2

Create a large DataFrame

— iterrows() —

— itertuples() —

— to_dict(‘records’) —

— apply() —

— Vectorized —

— Print Results —

Verify that all results are the same (optional, but good practice)

print(np.array_equal(result_iterrows, result_vectorized))

print(np.array_equal(result_itertuples, result_vectorized))

print(np.array_equal(result_todict, result_vectorized))

print(np.array_equal(result_apply.to_numpy(), result_vectorized))

Attempting to modify col1 using iterrows()

Correct way with iterrows:

Create a large DataFrame

Define the row-wise function with @jit

Extract values as NumPy arrays (important for numba)

Call the numba-compiled function

Apply the result (same result as earlier).

Assuming you have a large CSV file called ‘large_data.csv’

Leave a Comment Cancel Reply