Pandas is one of the most widely used libraries in Python for data analysis and manipulation. It provides powerful tools to handle structured data efficiently. Among these tools, the .loc[]
indexer is essential for accessing and modifying specific parts of a DataFrame.
In this article, we’ll explore how to use loc
in pandas DataFrame for row and column selection, slicing, filtering, updating values, and more. Whether you’re a beginner or intermediate user, mastering loc
can significantly enhance your data manipulation skills in pandas.
What is loc
in Pandas?
The loc
function is a label-based indexer used to access a group of rows and columns by labels or boolean arrays. Unlike integer-based selection with iloc
, loc
focuses on label-based access.
Here is the basic syntax:
df.loc[<row_label>, <column_label>]
<row_label>
can be a single label, list of labels, slice object, or boolean array<column_label>
can be the same types as row
Let’s look at examples for each of these use cases.
Setting Up the Example DataFrame
Before diving into examples, let’s create a simple pandas DataFrame that we’ll use throughout this tutorial.
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
'Age': [25, 30, 35, 40, 45],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D', 'E'])
This DataFrame looks like:
Name Age City
A Alice 25 New York
B Bob 30 Los Angeles
C Charlie 35 Chicago
D David 40 Houston
E Eva 45 Phoenix
Selecting Rows with loc
Selecting a Single Row
To select a single row by its label:
df.loc['A']
This will return:
Name Alice
Age 25
City New York
Name: A, dtype: object
Selecting Multiple Rows
You can select multiple rows using a list of labels:
df.loc[['A', 'C', 'E']]
This will return a DataFrame with rows A, C, and E.
Slicing Rows
Use a slice of index labels to select a range of rows:
df.loc['B':'D']
Unlike regular Python slicing, loc
includes both the start and end labels in the output.
Selecting Columns with loc
Selecting a Single Column
df.loc[:, 'Name']
This selects all rows but only the ‘Name’ column.
Selecting Multiple Columns
df.loc[:, ['Name', 'Age']]
This returns a DataFrame with only the ‘Name’ and ‘Age’ columns.
Selecting Specific Rows and Columns
To select specific rows and specific columns:
df.loc[['A', 'C'], ['Name', 'City']]
This returns the ‘Name’ and ‘City’ for rows A and C.
Using Boolean Conditions with loc
One of the most powerful uses of loc
is filtering rows based on conditions.
Filter Rows Based on Column Value
df.loc[df['Age'] > 30]
This returns all rows where the ‘Age’ is greater than 30.
Filter and Select Specific Columns
df.loc[df['Age'] > 30, ['Name', 'City']]
This filters the rows and returns only the ‘Name’ and ‘City’ columns.
Combine Multiple Conditions
Use bitwise operators &
(and), |
(or), and ~
(not) with parentheses.
df.loc[(df['Age'] > 30) & (df['City'] != 'Chicago')]
This filters rows where age is over 30 and city is not Chicago.
Updating Values with loc
You can also modify values in a DataFrame using loc
.
Update a Single Value
df.loc['A', 'Age'] = 26
This changes Alice’s age to 26.
Update an Entire Row
df.loc['B'] = ['Bobby', 31, 'San Francisco']
This updates the entire row for index B.
Update Multiple Rows Conditionally
df.loc[df['Age'] > 40, 'City'] = 'Unknown'
This sets the ‘City’ to ‘Unknown’ for everyone older than 40.
Adding New Columns with loc
You can add a new column using loc
as well.
df.loc[:, 'Senior'] = df['Age'] > 35
This adds a ‘Senior’ column with boolean values.
Using loc
with Index Reset
If your DataFrame doesn’t have custom index labels, you can still use loc
with default integer-based labels, which are treated as strings when using loc
.
Alternatively, you can reset the index and use numeric access like:
df_reset = df.reset_index()
df_reset.loc[0]
Differences Between loc
and iloc
Feature | loc | iloc |
---|---|---|
Based on | Labels | Integer positions |
Slice behavior | Inclusive of both start & end | Exclusive of end index |
Usage | df.loc['A'] | df.iloc[0] |
Knowing when to use loc
vs iloc
depends on whether your DataFrame uses meaningful index labels or default integers.
Common Errors and How to Avoid Them
KeyError
If the row or column label doesn’t exist, you’ll get a KeyError
.
Fix: Always check df.index
and df.columns
or use df.get()
if unsure.
Mixing iloc
with loc
Avoid combining loc
and iloc
in a single expression.
Best Practices for Using loc
- Use clear and descriptive index labels when creating DataFrames
- Use slicing with
loc
for readability and maintainability - Combine filtering and column selection in a single
loc
call for efficiency - Avoid chained indexing like
df['col'][cond] = value
—useloc
instead
Conclusion
The loc
function in pandas is an essential tool for label-based access and assignment in DataFrames. With it, you can select, filter, update, and manipulate your data more effectively. Mastering loc
unlocks powerful data transformation capabilities and helps you write cleaner, more efficient code.
If you’re working with pandas regularly, practicing different loc
patterns will not only save time but also prevent common bugs in your data pipelines. Happy coding!