Pandas Indexing & Slicing Syntax Cheat Sheet
Introduction
Pandas is a data manipulation library in Python.
I’m not writing about it because I like it. It’s syntax is everything but helpful.
The problem is that Pandas is used a lot in the Python ecosystem.
Something simple like slicing and indexing is horribly complicated. The API is based on the internals, not on the mathematics or logic of the user that the user wants to express. This makes it hard to remember, use and read the syntax.
I have to write this post as a note to myself and others who struggle with the syntax of Pandas indexing and slicing.
Pandas Indexing & Slicing Methods Overview
Pure vs Impure methods
Indexing and slicing can be done based on labels (names) or positions (integers). Pandas provides several methods. The pure methods work with only labels or only positions. The mixed methods work with both, depending on context. This is even harder to memorize.
“Pure” Methods (Consistent behavior):
.loc[]
- Label-based Indexing
The .loc[]
method returns the carthesian product of rows and columns. It strictly label-based. It always uses the index and column names.
df.loc[row_labels, col_labels]'a':'c'] # INCLUSIVE on both ends
df.loc['col'] > 5] # Boolean indexing
df.loc[df[# Always uses index/column names
The labels are the values on the index (rows) and the column names (columns).
Here is a concrete example:
import pandas as pd
= pd.DataFrame({'A': [1, 2, 3],
df 'B': [4, 5, 6]},
=['x', 'y', 'z'])
index'x'] # Returns Series: A=1, B=4
df.loc['x', 'A'] # Returns: 1 (scalar)
df.loc['x':'y'] # Returns DataFrame (2 rows)
df.loc['x':'y', 'A'] # Returns Series: x=1, y=2
df.loc['x', ['A', 'B']] # Returns Series: A=1, B=4 df.loc[
.iloc[]
- Position-based Indexing
The .iloc[]
method also returns the carthesian product of rows and columns. This method is strictly position-based. It always uses integer positions.
df.iloc[row_pos, col_pos]0:3] # EXCLUSIVE end (standard Python)
df.iloc[-1] # Last row
df.iloc[# Always uses integer positions
.at[]
/ .iat[]
- Single scalar value
The methods .at[]
and .iat[]
are optimized for speed when accessing a single value. .at[]
is label-based, .iat[]
is position-based.
The syntac is similar to .loc[]
and .iloc[]
, but only for single values:
'row_label', 'col_label'] # Label-based scalar
df.at[0, 1] # Position-based scalar
df.iat[# Fastest for single values
“Impure” Methods (Mixed/context-dependent behavior):
The following methods have mixed or context-dependent behavior.
df[]
- Convenience accessor
'col'] # Column (by name)
df['col1', 'col2']] # Columns (by names)
df[[0:3] # Rows (by POSITION!)
df['col'] > 5] # Rows (by boolean)
df[df['2020':'2021'] # Rows (by label if index is dates/strings)
df[# Behavior changes based on input type!
.xs()
- Cross-section
Above examples use normal indexes. The Cross section is useful when working with MultiIndexes.
The syntax is:
'key') # Can be label or level in MultiIndex
df.xs('a', 'b'), level=[0,1]) # Mixed with MultiIndex for rows
df.xs(('col_name', axis=1, level='level_name')
df.xs(# Behavior depends on index structure
Here is an example with MultiIndex rows:
# MultiIndex DataFrame
= [['bar', 'bar', 'baz', 'baz'],
arrays 'one', 'two', 'one', 'two']]
[= pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])
index = pd.DataFrame({'A': [1, 2, 3, 4]}, index=index)
df
# Select cross-section at first level
'bar', level='first')
df.xs(# A
# second
# one 1
# two 2
# Select cross-section at second level
'one', level='second')
df.xs(# A
# first
# bar 1
# baz 3
# Multiple levels
'bar', 'two')) # Returns: A=2 (scalar/Series) df.xs((
.query()
- String evaluation
Query is a Convenience method that evalues sql-like expressions on the DataFrame.
'col > 5') # Evaluates string
df.query('col > @var') # Mixes with Python variables
df.query(# Mixes string parsing with DataFrame operations
Overview
This is how I remember the different methods / look the up.
Comparison Table:
Method | Type | Label/Position | Use Case | Predictable? |
---|---|---|---|---|
.loc[] |
Pure | Label | Explicit label-based access | ✓ Always |
.iloc[] |
Pure | Position | Explicit position-based access | ✓ Always |
.at[] |
Pure | Label | Fast single value | ✓ Always |
.iat[] |
Pure | Position | Fast single value | ✓ Always |
df[] |
Impure | Mixed | Quick access | ✗ Context-dependent |
.xs() |
Impure | Mixed | MultiIndex slicing | ✗ Depends on index |
.query() |
Impure | Label | Complex filters | ~ String parsing |
Other methods (Special purpose):
Boolean/conditional:
# Conditional replacement (keeps shape)
df.where() # Inverse of where
df.mask() filter() # Filter columns/rows by name pattern df.
Selection helpers:
/tail() # First/last n rows
df.head()# Random rows
df.sample() /nsmallest() # By values
df.nlargest()# By position array df.take()
Best Practices:
- Use pure methods when explicit behavior needed
- Use
.loc[]
for setting values (avoid chained indexing) - Use
df[]
only for simple column access - Be careful with
df[slice]
- it’s position-based, not label-based!
Conclusion
Remembering the Pandas API is hard if you do not use it daily. The consise syntax of impure methods like df[]
and .xs()
looks great, but is hard to remember and read.
When it comes to indexing and slicing, prefer the pure methods .loc[]
and .iloc[]
for clarity and predictability.