Pandas / Polars — Tech Guide

01

Quick Reference

The most common operations at a glance. Pandas on the left, Polars on the right.

Core Operations

Operation	Pandas	Polars
Create DataFrame	`pd.DataFrame({"a": [1,2]})`	`pl.DataFrame({"a": [1,2]})`
Read CSV	`pd.read_csv("f.csv")`	`pl.read_csv("f.csv")`
Select columns	`df[["a", "b"]]`	`df.select("a", "b")`
Filter rows	`df[df["a"] > 5]`	`df.filter(pl.col("a") > 5)`
Add column	`df["c"] = df["a"] + 1`	`df.with_columns((pl.col("a") + 1).alias("c"))`
Group & aggregate	`df.groupby("a").agg({"b": "sum"})`	`df.group_by("a").agg(pl.col("b").sum())`
Sort	`df.sort_values("a", ascending=False)`	`df.sort("a", descending=True)`
Join	`pd.merge(df1, df2, on="k")`	`df1.join(df2, on="k")`
Drop nulls	`df.dropna()`	`df.drop_nulls()`
Rename columns	`df.rename(columns={"a": "x"})`	`df.rename({"a": "x"})`
Shape	`df.shape`	`df.shape`
Describe	`df.describe()`	`df.describe()`

Key Differences at a Glance

Feature	Pandas	Polars
Index-based	✓	✗
Lazy evaluation	✗	✓
Multi-threaded	✗	✓
Arrow-native	~	✓
Ecosystem maturity	✓	~
Mutation (in-place)	✓	✗
Streaming large files	✗	✓
GPU acceleration	~ (cuDF)	✓ (GPU engine)

02

Setup & Installation

Getting started with both libraries. They can coexist in the same environment.

Installation

Pandas

# pip
pip install pandas

# conda
conda install pandas

# With optional dependencies
pip install pandas[all]

# Specific extras
pip install pandas[parquet,excel,sql]

Polars

# pip
pip install polars

# conda
conda install polars

# With all optional features
pip install polars[all]

# Specific extras
pip install polars[numpy,pandas,pyarrow]

Import Conventions

Pandas

import pandas as pd
import numpy as np

# Version check
print(pd.__version__)

Polars

import polars as pl

# Version check
print(pl.__version__)

# Optional: expression namespace
from polars import col, lit, when

Interop: Polars can convert to/from Pandas seamlessly: df_polars = pl.from_pandas(df_pandas) and df_pandas = df_polars.to_pandas(). Polars uses Arrow memory, so conversion to Pandas 2.x with ArrowDtype is zero-copy.

03

DataFrames & Series

The fundamental data structures. Both libraries center on the DataFrame, but differ in their approach to indexing and mutability.

Creating DataFrames

Pandas

# From dict
df = pd.DataFrame({
    "name": ["Alice", "Bob", "Carol"],
    "age": [30, 25, 35],
    "score": [85.5, 92.0, 78.3]
})

# With explicit index
df = pd.DataFrame(
    {"x": [1,2,3]},
    index=["a","b","c"]
)

# From list of dicts
df = pd.DataFrame([
    {"a": 1, "b": 2},
    {"a": 3, "b": 4}
])

# From NumPy array
df = pd.DataFrame(
    np.random.randn(5, 3),
    columns=["x", "y", "z"]
)

Polars

# From dict
df = pl.DataFrame({
    "name": ["Alice", "Bob", "Carol"],
    "age": [30, 25, 35],
    "score": [85.5, 92.0, 78.3]
})

# No index concept — use row_nr if needed
df = pl.DataFrame({
    "x": [1,2,3]
}).with_row_index("idx")

# From list of dicts
df = pl.DataFrame([
    {"a": 1, "b": 2},
    {"a": 3, "b": 4}
])

# With explicit schema
df = pl.DataFrame(
    {"x": [1,2], "y": [3.0,4.0]},
    schema={"x": pl.Int32, "y": pl.Float64}
)

Series

Pandas

# Create a Series
s = pd.Series([1, 2, 3], name="values")

# Access by label/position
s.loc[0]       # label-based
s.iloc[0]      # position-based

# String methods
df["name"].str.upper()
df["name"].str.contains("li")

# Vectorized math
(df["score"] * 1.1).round(1)

Polars

# Create a Series
s = pl.Series("values", [1, 2, 3])

# Access by position (no label index)
s[0]            # integer indexing
s.gather([0, 2]) # multiple indices

# String methods (via expressions)
df.select(pl.col("name").str.to_uppercase())
df.select(pl.col("name").str.contains("li"))

# Vectorized math
df.select((pl.col("score") * 1.1).round(1))

Column Selection & Manipulation

Pandas

# Select columns
df[["name", "age"]]

# Add / mutate column (in-place)
df["senior"] = df["age"] >= 30

# Add via assign (returns copy)
df = df.assign(
    senior=lambda x: x["age"] >= 30,
    grade=lambda x: np.where(
        x["score"] >= 90, "A", "B"
    )
)

# Drop columns
df.drop(columns=["score"])

# Rename
df.rename(columns={"age": "years"})

Polars

# Select columns
df.select("name", "age")
df.select(pl.col("name", "age"))

# Add / mutate column (returns new df)
df = df.with_columns(
    (pl.col("age") >= 30).alias("senior")
)

# Multiple new columns at once
df = df.with_columns(
    (pl.col("age") >= 30).alias("senior"),
    pl.when(pl.col("score") >= 90)
      .then(pl.lit("A"))
      .otherwise(pl.lit("B"))
      .alias("grade")
)

# Drop columns
df.drop("score")

# Rename
df.rename({"age": "years"})

Data Types

Type	Pandas	Polars	Notes
Integer	`int64`, `Int64`	`Int8..Int64`	Polars is strict about width
Float	`float64`	`Float32`, `Float64`
String	`object`, `string`	`Utf8` (`String`)	Polars uses Arrow strings
Boolean	`bool`, `boolean`	`Boolean`
Datetime	`datetime64[ns]`	`Datetime`	Polars supports microsecond precision
Categorical	`category`	`Categorical`, `Enum`	Polars Enum is strict/ordered
Nested	`object` (dicts/lists)	`List`, `Struct`, `Array`	Polars has first-class nested types
Null handling	`NaN` / `pd.NA`	`null`	Polars uses Arrow null bitmask

04

Reading & Writing Data

Both libraries support CSV, Parquet, JSON, Excel, and databases. Polars adds lazy scanning for files larger than memory.

CSV

Pandas

# Read
df = pd.read_csv("data.csv")
df = pd.read_csv("data.csv",
    usecols=["a", "b"],
    dtype={"a": "int32"},
    nrows=1000,
    parse_dates=["date"]
)

# Chunked reading
for chunk in pd.read_csv("big.csv",
                          chunksize=10000):
    process(chunk)

# Write
df.to_csv("out.csv", index=False)

Polars

# Read (eager)
df = pl.read_csv("data.csv")
df = pl.read_csv("data.csv",
    columns=["a", "b"],
    dtypes={"a": pl.Int32},
    n_rows=1000,
    try_parse_dates=True
)

# Lazy scan (larger than memory)
lf = pl.scan_csv("big.csv")
result = lf.filter(
    pl.col("a") > 100
).collect()

# Write
df.write_csv("out.csv")

Parquet

Pandas

# Requires pyarrow or fastparquet
df = pd.read_parquet("data.parquet")
df = pd.read_parquet("data.parquet",
    columns=["a", "b"],
    engine="pyarrow"
)

# Write with compression
df.to_parquet("out.parquet",
    compression="snappy",
    index=False
)

# Read from S3 (with s3fs)
df = pd.read_parquet("s3://bucket/data.parquet")

Polars

# Native Rust parquet reader
df = pl.read_parquet("data.parquet")
df = pl.read_parquet("data.parquet",
    columns=["a", "b"]
)

# Write with compression
df.write_parquet("out.parquet",
    compression="zstd"
)

# Lazy scan — predicate pushdown!
lf = pl.scan_parquet("s3://bucket/data.parquet")
result = lf.filter(
    pl.col("date") > "2024-01-01"
).select("a", "b").collect()

Predicate pushdown: When scanning Parquet lazily, Polars pushes filter predicates into the reader, skipping entire row groups that don't match. This can reduce I/O by 10-100x on partitioned datasets.

Other Formats

Format	Pandas	Polars
JSON	`pd.read_json()` / `df.to_json()`	`pl.read_json()` / `df.write_json()`
NDJSON	`pd.read_json(lines=True)`	`pl.read_ndjson()` / `pl.scan_ndjson()`
Excel	`pd.read_excel()`	`pl.read_excel()`
SQL	`pd.read_sql(query, conn)`	`pl.read_database(query, conn)`
IPC / Arrow	`pd.read_feather()`	`pl.read_ipc()` / `pl.scan_ipc()`
Delta Lake	via `deltalake` pkg	`pl.read_delta()` / `pl.scan_delta()`

05

Filtering & Selection

Selecting rows and columns. Pandas uses bracket indexing and .loc/.iloc; Polars uses expression-based filter/select.

Row Filtering

Pandas

# Boolean mask
df[df["age"] > 25]

# Multiple conditions
df[(df["age"] > 25) & (df["score"] >= 80)]

# Using .query() (string DSL)
df.query("age > 25 and score >= 80")

# isin
df[df["name"].isin(["Alice", "Bob"])]

# String contains
df[df["name"].str.startswith("A")]

# Between
df[df["age"].between(25, 35)]

# Null filtering
df[df["score"].notna()]

# Head / tail / sample
df.head(5)
df.tail(5)
df.sample(3)

Polars

# Expression-based filter
df.filter(pl.col("age") > 25)

# Multiple conditions
df.filter(
    (pl.col("age") > 25) & (pl.col("score") >= 80)
)

# Using SQL (optional)
pl.sql("SELECT * FROM df WHERE age > 25")

# is_in
df.filter(pl.col("name").is_in(["Alice", "Bob"]))

# String contains
df.filter(pl.col("name").str.starts_with("A"))

# is_between
df.filter(pl.col("age").is_between(25, 35))

# Null filtering
df.filter(pl.col("score").is_not_null())

# Head / tail / sample
df.head(5)
df.tail(5)
df.sample(3)

Indexing & Slicing

Pandas

# Label-based indexing
df.loc[0, "name"]           # single cell
df.loc[0:2, ["name", "age"]] # slice

# Position-based indexing
df.iloc[0]                  # first row
df.iloc[0:5, 1:3]          # row/col slice

# Setting values
df.loc[0, "age"] = 31

# Conditional setting
df.loc[df["age"] > 30, "senior"] = True

Polars

# No label-based index; use row/item
df.row(0)                    # tuple
df.item(0, "name")           # single cell

# Slicing
df.slice(0, 5)               # offset, length
df[0:5]                      # Python slice syntax

# Polars is immutable — use with_columns
df = df.with_columns(
    pl.when(pl.col("age") > 30)
      .then(pl.lit(True))
      .otherwise(pl.lit(False))
      .alias("senior")
)

# Gather specific rows
df.gather([0, 3, 7])

No Index in Polars: Polars deliberately omits a row index. This simplifies the API, avoids alignment bugs, and enables better parallelization. If you need an identifier column, use df.with_row_index("idx").

06

GroupBy & Aggregation

Split-apply-combine patterns. Both libraries support grouping with multiple aggregation functions, but the syntax differs significantly.

Basic GroupBy

Pandas

# Single aggregation
df.groupby("dept")["salary"].mean()

# Multiple aggregations
df.groupby("dept").agg({
    "salary": ["mean", "max"],
    "age": "count"
})

# Named aggregations (recommended)
df.groupby("dept").agg(
    avg_salary=("salary", "mean"),
    max_salary=("salary", "max"),
    headcount=("age", "count")
)

# Group by multiple columns
df.groupby(["dept", "level"]).mean()

Polars

# Single aggregation
df.group_by("dept").agg(
    pl.col("salary").mean()
)

# Multiple aggregations
df.group_by("dept").agg(
    pl.col("salary").mean().alias("avg_salary"),
    pl.col("salary").max().alias("max_salary"),
    pl.col("age").count().alias("headcount"),
)

# Expressions are composable
df.group_by("dept").agg(
    (pl.col("salary").max() - pl.col("salary").min())
      .alias("salary_range")
)

# Group by multiple columns
df.group_by("dept", "level").agg(
    pl.all().mean()
)

Advanced Aggregations

Pandas

# Transform (broadcast back to original shape)
df["z_score"] = df.groupby("dept")["salary"] \
    .transform(lambda x: (x - x.mean()) / x.std())

# Filter groups
df.groupby("dept").filter(
    lambda g: g["salary"].mean() > 50000
)

# Apply custom function
df.groupby("dept").apply(
    lambda g: g.nlargest(3, "salary")
)

# Pivot table
pd.pivot_table(df,
    values="salary",
    index="dept",
    columns="level",
    aggfunc="mean"
)

Polars

# Over expressions (like SQL window funcs)
df = df.with_columns(
    ((pl.col("salary") - pl.col("salary").mean().over("dept"))
     / pl.col("salary").std().over("dept"))
    .alias("z_score")
)

# Filter groups
df.filter(
    pl.col("salary").mean().over("dept") > 50000
)

# Top N per group
df.sort("salary", descending=True) \
  .group_by("dept").head(3)

# Pivot
df.pivot(
    on="level",
    index="dept",
    values="salary",
    aggregate_function="mean"
)

Common Aggregation Functions

Function	Pandas	Polars
Count	`.count()`	`.count()`
Sum	`.sum()`	`.sum()`
Mean	`.mean()`	`.mean()`
Median	`.median()`	`.median()`
Std Dev	`.std()`	`.std()`
Min / Max	`.min()` / `.max()`	`.min()` / `.max()`
First / Last	`.first()` / `.last()`	`.first()` / `.last()`
N unique	`.nunique()`	`.n_unique()`
Quantile	`.quantile(0.75)`	`.quantile(0.75)`
Value counts	`.value_counts()`	`.value_counts()`

07

Joins & Merges

Combining DataFrames. Pandas uses merge/join/concat; Polars uses join/concat with more explicit join types.

Basic Joins

Pandas

# Inner join
result = pd.merge(orders, customers,
    on="customer_id",
    how="inner"
)

# Left join
result = pd.merge(orders, customers,
    on="customer_id",
    how="left"
)

# Join on different column names
result = pd.merge(orders, customers,
    left_on="cust_id",
    right_on="id"
)

# Multi-key join
result = pd.merge(df1, df2,
    on=["year", "month"]
)

Polars

# Inner join
result = orders.join(customers,
    on="customer_id",
    how="inner"
)

# Left join
result = orders.join(customers,
    on="customer_id",
    how="left"
)

# Join on different column names
result = orders.join(customers,
    left_on="cust_id",
    right_on="id"
)

# Multi-key join
result = df1.join(df2,
    on=["year", "month"]
)

Join Types

Join Type	Pandas `how=`	Polars `how=`	Description
Inner	`"inner"`	`"inner"`	Only matching rows
Left	`"left"`	`"left"`	All left rows + matching right
Right	`"right"`	`"right"` (v1.x: supported)	All right rows + matching left
Full outer	`"outer"`	`"full"`	All rows from both sides
Cross	`"cross"`	`"cross"`	Cartesian product
Semi	N/A (use `isin`)	`"semi"`	Left rows where key exists in right
Anti	N/A (use `~isin`)	`"anti"`	Left rows where key NOT in right

Concatenation

Pandas

# Vertical stack (union)
result = pd.concat([df1, df2, df3])

# Horizontal (side by side)
result = pd.concat([df1, df2], axis=1)

# Reset index after concat
result = pd.concat([df1, df2],
    ignore_index=True
)

Polars

# Vertical stack (union)
result = pl.concat([df1, df2, df3])

# Horizontal (side by side)
result = pl.concat([df1, df2], how="horizontal")

# Diagonal (union with different schemas)
result = pl.concat([df1, df2],
    how="diagonal"
)

Polars semi/anti joins are first-class operations — no workarounds needed. A semi join keeps left rows where the key exists in the right table (without duplicating rows), and an anti join keeps left rows where the key does not exist. These are essential for "exists" / "not exists" SQL patterns.

08

Window Functions

Computing values relative to groups or ordered windows. Polars uses the powerful .over() expression; Pandas uses .transform() and rolling APIs.

Partition-Based Windows

Pandas

# Running total per group
df["cum_sales"] = df.groupby("dept")["sales"] \
    .cumsum()

# Rank within group
df["rank"] = df.groupby("dept")["sales"] \
    .rank(ascending=False, method="dense")

# Shift (lag/lead)
df["prev_sales"] = df.groupby("dept")["sales"] \
    .shift(1)

# Pct change within group
df["growth"] = df.groupby("dept")["sales"] \
    .pct_change()

Polars

# Running total per group
df = df.with_columns(
    pl.col("sales").cum_sum().over("dept")
      .alias("cum_sales")
)

# Rank within group
df = df.with_columns(
    pl.col("sales").rank(descending=True,
        method="dense").over("dept")
      .alias("rank")
)

# Shift (lag/lead)
df = df.with_columns(
    pl.col("sales").shift(1).over("dept")
      .alias("prev_sales")
)

# Pct change within group
df = df.with_columns(
    pl.col("sales").pct_change().over("dept")
      .alias("growth")
)

Rolling Windows

Pandas

# Rolling average
df["ma7"] = df["price"].rolling(7).mean()

# Rolling std with min_periods
df["vol"] = df["price"].rolling(
    window=20, min_periods=10
).std()

# Expanding (cumulative) window
df["cum_max"] = df["price"].expanding().max()

# EWM (exponential weighted)
df["ewm"] = df["price"].ewm(span=12).mean()

Polars

# Rolling average
df = df.with_columns(
    pl.col("price").rolling_mean(window_size=7)
      .alias("ma7")
)

# Rolling std with min_periods
df = df.with_columns(
    pl.col("price").rolling_std(
        window_size=20, min_periods=10
    ).alias("vol")
)

# Expanding (cumulative) window
df = df.with_columns(
    pl.col("price").cum_max().alias("cum_max")
)

# EWM (exponential weighted)
df = df.with_columns(
    pl.col("price").ewm_mean(span=12)
      .alias("ewm")
)

09

Lazy Evaluation

Polars' killer feature: build a query plan, then let the optimizer rewrite and parallelize it before execution. Pandas has no equivalent.

LazyFrame Basics

In [1]: Polars Only

# Create a LazyFrame from eager DataFrame
lf = df.lazy()

# Or scan directly (recommended for large files)
lf = pl.scan_csv("huge_dataset.csv")
lf = pl.scan_parquet("data/*.parquet")  # glob patterns!
lf = pl.scan_ipc("data.arrow")

# Build a query plan (nothing executes yet)
result = (
    lf
    .filter(pl.col("country") == "US")
    .group_by("state")
    .agg(
        pl.col("revenue").sum().alias("total_rev"),
        pl.col("customers").n_unique().alias("unique_cust"),
    )
    .sort("total_rev", descending=True)
    .head(10)
)

# Execute the plan
df_result = result.collect()

# Inspect the optimized plan
print(result.explain(optimized=True))

Query Optimizations

What the optimizer does automatically:

Predicate Pushdown

Filters are pushed as close to the data source as possible, reducing rows read from disk.

# Filter pushed into scan
pl.scan_parquet("data.parquet")
  .filter(pl.col("year") == 2024)
  .collect()
# Only reads matching row groups

Projection Pushdown

Only the columns used downstream are read from the source file.

# Only reads "name" and "score" columns
pl.scan_csv("wide_table.csv")
  .select("name", "score")
  .collect()

Common Subexpression Elimination

Duplicate computations are detected and computed once.

# mean(x) computed once, reused
lf.with_columns(
    (pl.col("x") - pl.col("x").mean()).alias("centered"),
    (pl.col("x") / pl.col("x").mean()).alias("normed"),
)

Slice Pushdown

Limits are pushed down to avoid processing unnecessary rows.

# Only processes 10 groups total
lf.group_by("region").agg(
    pl.col("sales").sum()
).sort("sales", descending=True)
  .head(10).collect()

Streaming Mode

In [2]: Polars Only

# Process data larger than memory in streaming chunks
result = (
    pl.scan_csv("100gb_file.csv")
    .filter(pl.col("status") == "active")
    .group_by("category")
    .agg(pl.col("amount").sum())
    .collect(streaming=True)  # streaming engine
)

# Sink results directly to file (no memory needed)
(
    pl.scan_parquet("input/*.parquet")
    .filter(pl.col("date") >= "2024-01-01")
    .sink_parquet("output/filtered.parquet")
)

Pandas alternative for large files: Use pd.read_csv("f.csv", chunksize=N) to iterate in chunks, but you must manage concatenation and aggregation manually. There is no query optimizer.

10

Performance & Memory

1.8s

Polars

0.16s

Why Polars Is Faster

Rust Core Engine

Written in Rust, compiled to native code. No Python GIL overhead for data operations. The Python API is a thin wrapper over the Rust engine.

Multi-threaded Execution

Automatically parallelizes across all available CPU cores. GroupBy, joins, and aggregations run in parallel without user configuration.

Apache Arrow Memory

Uses Arrow columnar format natively. Cache-friendly memory layout, zero-copy interop with other Arrow-based tools, and efficient null handling.

Query Optimizer

Lazy mode rewrites your query plan: eliminates redundant operations, pushes predicates/projections, and fuses operations for minimal passes over data.

Memory Usage

Aspect	Pandas	Polars
String storage	Python objects (high overhead)	Arrow strings (contiguous, compact)
Null handling	NaN sentinel (wastes float64 space)	Arrow null bitmask (1 bit per value)
Integer nullability	Upcasts to float64 for NaN	Native nullable integers
Copy-on-write	CoW in Pandas 2.x (opt-in)	Immutable by design
Peak memory	Often 5-10x dataset size	Usually 2-3x dataset size
Larger-than-RAM	Manual chunking only	Streaming engine + sink

Tip: Both

# Pandas: check memory usage
df.info(memory_usage="deep")
df.memory_usage(deep=True).sum() / 1e6  # MB

# Polars: check memory usage
df.estimated_size("mb")

# Pandas: downcast types to save memory
df["id"] = pd.to_numeric(df["id"], downcast="integer")
df["name"] = df["name"].astype("category")

# Polars: cast to smaller types
df = df.with_columns(
    pl.col("id").cast(pl.Int32),
    pl.col("name").cast(pl.Categorical)
)

11

When to Use Which

Both libraries are excellent. The right choice depends on your dataset size, team familiarity, ecosystem needs, and performance requirements.

Choose Pandas When

Mature Ecosystem Needed

Your workflow depends on libraries that expect Pandas DataFrames: scikit-learn, matplotlib, seaborn, statsmodels, Prophet, etc. Most ML and visualization libraries assume Pandas.

Small-Medium Data

Working with datasets under ~1-5 GB that fit comfortably in memory. The performance difference with Polars may not matter at this scale, and Pandas is more familiar.

Team Familiarity

Your team already knows Pandas well. Decades of Stack Overflow answers, tutorials, and institutional knowledge. The switching cost may not be justified.

Interactive Exploration

Jupyter notebook exploratory analysis where mutation (in-place modification) and the index are convenient. Quick prototyping, plotting, and ad-hoc analysis.

Choose Polars When

Performance Matters

Working with large datasets (1GB+), production ETL pipelines, or any scenario where wall-clock time matters. Polars is typically 5-20x faster than Pandas.

Larger Than Memory

Data doesn't fit in RAM. Polars' lazy evaluation with streaming and sink operations can process arbitrarily large datasets with bounded memory.

Data Pipelines & ETL

Building production data pipelines where correctness and performance matter. Strict typing, immutability, and the query optimizer catch bugs early and optimize automatically.

Modern Stack

Starting a new project with no legacy constraints. Polars' expression-based API is more consistent and composable than Pandas' mixed bag of methods.

Decision Matrix

Scenario	Recommendation	Why
EDA in Jupyter	Pandas	Better plotting integration, mutation convenience
ETL pipeline > 1GB	Polars	Lazy eval, streaming, multi-threaded
ML feature engineering	Either	Polars for speed, Pandas for sklearn compat
Real-time data processing	Polars	Lower latency, no GIL contention
Legacy codebase	Pandas	Migration cost; gradual Polars adoption possible
Time-series analysis	Either	Both have rolling/window; Pandas has more ts-specific tools
Parquet data lake	Polars	scan_parquet with predicate pushdown is unbeatable
New green-field project	Polars	Better API design, faster, growing ecosystem

Using Both Together

In [3]: Hybrid Workflow

import pandas as pd
import polars as pl

# 1. Use Polars for heavy ETL
clean_data = (
    pl.scan_parquet("raw/*.parquet")
    .filter(pl.col("valid") == True)
    .group_by("category")
    .agg(
        pl.col("revenue").sum(),
        pl.col("orders").count(),
    )
    .sort("revenue", descending=True)
    .collect()
)

# 2. Convert to Pandas for ML / visualization
df_pd = clean_data.to_pandas()

# 3. Use with scikit-learn
from sklearn.preprocessing import StandardScaler
X = StandardScaler().fit_transform(df_pd[["revenue", "orders"]])

# 4. Plot with matplotlib/seaborn
import matplotlib.pyplot as plt
df_pd.plot.bar(x="category", y="revenue")
plt.show()

Zero-copy conversion: With Pandas 2.x using ArrowDtype backends, converting between Polars and Pandas can be zero-copy (no data duplication). Use df_polars.to_pandas(use_pyarrow_extension_array=True) to preserve Arrow memory layout in Pandas.