qutePandas | API Reference

Core

DataFrame(data, columns=None) pd.DataFrame()

Initialize a kdb+ memory-mapped table from standard Python objects. This constructor bridges the gap between row-major Python memory and columnar kdb+ performance, materializing data immediately into the backend. For out-of-core datasets larger than RAM, consider using from_csv with memory mapping.

Parameters

Name	Description
datadict, list, or pd.DataFrame	Source data. Dictionary keys become column headers; nested lists are treated as row records. Existing DataFrames are efficiently mirrored into the kdb+ workspace.
columnslist of str	Explicit schema definition. This is required when the input `data` is a list without intrinsic header information.

Example

import qutePandas as qpd
data = {'A': [1, 2], 'B': [3, 4]}
df = qpd.DataFrame(data)

connect(license_path=None) N/A

Global initializer for the kdb+ runtime environment. This function verifies license validity and manages critical environment variables such as QLIC and QHOME, which are required for embedded PyKX operations. It must be called once at the start of any session.

Parameters

Name	Description
license_pathstr	Optional explicit path to a `kc.lic` or `k4.lic` file. If omitted, the library searches standard paths and project-root folders.

Example

import qutePandas as qpd
qpd.connect()

install_license(content, is_base64=True) N/A

Programmatically installs a kdb+ license token or file content. This is particularly useful for containerized or ephemeral environments (such as CI/CD pipelines) where environment variables are preferred over physical file deployments.

Parameters

Name	Description
contentstr	The base64 encoded license token or raw content of a license file.
is_base64bool	Specifies if the `content` string is base64 encoded. Defaults to `True`.

Example

import qutePandas as qpd
qpd.install_license("YOUR_LICENSE_CONTENT")

print(obj, head=None, tail=None) print(df.head()) / print(df.tail())

Display a kdb+ table with formatted output including borders and column alignment. Avoids converting to pandas for better performance.

Parameters

Name	Description
objpykx.Table or pykx.KeyedTable	The table to display.
headint	Number of rows from the beginning to display. If specified, `tail` is ignored.
tailint	Number of rows from the end to display.

Example

import qutePandas as qpd
df = qpd.DataFrame({'name': ['Alice', 'Bob'], 'age': [25, 30]})
qpd.print(df, head=5)
qpd.print(df, tail=3)

Indexing & Selection

loc(df, rows=None, cols=None, return_type='q') df.loc[rows, cols]

Pure label-location based indexing for selection by label (or boolean array). Currently supports filtering rows via boolean mask.

Parameters

Name	Description
rowslist or pykx.BooleanVector	Boolean mask or labels for row selection.
colsstr or list of str	Column names to select.
return_typestr	Controls the output format: `'q'` or `'p'`.

Example

import qutePandas as qpd
mask = qpd.kx.q('>', df['price'], 100)
df_subset = qpd.loc(df, rows=mask, cols=['symbol', 'volume'])

iloc(df, rows=None, cols=None, return_type='q') df.iloc[rows, cols]

Pure integer-location based indexing for selection by position.

Parameters

Name	Description
rowsint, list, or slice	Row indices to select.
colsint, list, or slice	Column indices to select.
return_typestr	Controls the output format: `'q'` or `'p'`.

Example

import qutePandas as qpd
# Select first 10 rows and first 2 columns
df_subset = qpd.iloc(df, rows=slice(0, 10), cols=slice(0, 2))

Cleaning

dropna(df, return_type='q') df.dropna()

Eliminates any records containing null values across the entire dataset. This operation leverages kdb+'s efficient vector null-checking to prune incomplete data, preventing the propagation of nulls in downstream analytical models.

Parameters

Name	Description
dfSource	The input qutePandas or PyKX Table to be cleaned.
return_typestr	Controls the output format: `'q'` for maximum performance or `'p'` for immediate conversion back to Pandas.

Example

import qutePandas as qpd
df_clean = qpd.dropna(df)

dropna_col(df, col, return_type='q') df.dropna(subset=[col])

Targeted null removal focused on a specific column. This is essential for datasets where some attributes are permitted to be null, but core identifier or feature columns must remain complete for valid computation.

Parameters

Name	Description
colstr	The target column to scan for nulls. The library handles various null representations (e.g., `0Nh` for integers or `""` for strings) automatically based on the column's underlying kdb+ type.
return_typestr	Controls the output format: `'q'` or `'p'`.

Example

import qutePandas as qpd
df_clean = qpd.dropna_col(df, col='price')

fillna(df, col, value, return_type='q') df.fillna()

Replaces null entries with a specified constant value. This enabling "zero-filling" or default assignment in feature engineering pipelines without creating expensive memory copies, using kdb+'s native ^ (fill) operator.

Parameters

Name	Description
colstr	The column to apply the fill operation to.
valuescalar	The replacement value. Python strings are automatically converted to kdb+ symbols where appropriate for optimized storage.
return_typestr	Controls the output format: `'q'` or `'p'`.

Example

import qutePandas as qpd
df_filled = qpd.fillna(df, col='volume', value=0)

remove_duplicates(df, return_type='q') df.drop_duplicates()

Retains only the first occurrence of unique rows in the dataset, discarding all subsequent duplicates. This is optimized using kdb+'s distinct primitive, making it exceptionally fast for de-duplicating high-volume tick data or batch audit logs.

Parameters

Name	Description
return_typestr	Controls the output format: `'q'` or `'p'`.

Example

import qutePandas as qpd
df_unique = qpd.remove_duplicates(df)

Transformation

cast(df, col, dtype, return_type='q') df.astype()

Changes the data type of a specific column. This is essential for correcting inference errors from raw data loads or optimizing storage for keyed lookups and joins. Supported types include standard numeric, text, and temporal primitives mapping directly to kdb+ internal types.

Parameters

Name	Description
colstr	The name of the target column for type conversion.
dtypestr	The target data type (e.g., `'int'`, `'symbol'`, `'float'`, `'timestamp'`).
return_typestr	Controls the output format: `'q'` or `'p'`.

Example

import qutePandas as qpd
df_cast = qpd.cast(df, col='userId', dtype='int')

Supported Type Mappings

Numeric: int, long, real, float

Text: string, symbol

KDB Primitives: i, j, f, s, c

rename(df, columns, return_type='q') df.rename()

Updates column headers using a dictionary mapping. This operation standardizes schemas across disparate data sources with zero data movement, as it modifies the table metadata directly in the kdb+ workspace.

Parameters

Name	Description
columnsdict	A dictionary mapping old column names to new ones, e.g., `{'old_name': 'new_name'}`.
return_typestr	Controls the output format: `'q'` or `'p'`.

Example

import qutePandas as qpd
df_renamed = qpd.rename(df, columns={'old_col': 'new_col'})

drop_col(df, cols, return_type='q') df.drop()

Removes unwanted columns from the dataset to reduce the memory footprint and simplify the schema before performing high-compute operations like joins or exports.

Parameters

Name	Description
colsstr or list	A single column name or a list of names to be discarded from the table.
return_typestr	Controls the output format: `'q'` or `'p'`.

Example

import qutePandas as qpd
df_reduced = qpd.drop_col(df, cols=['metadata', 'debug_flag'])

Grouping

groupby_sum(df, by_cols, sum_col, return_type='q') df.groupby().sum()

Aggregates records by key columns and computes the sum of a target numeric column. This leverages kdb+'s high-volume aggregation engine, which is particularly efficient for time-series and financial transaction datasets.

Parameters

Name	Description
by_colsstr or list	One or more columns to use as grouping keys.
sum_colstr	The numeric column to be aggregated.
return_typestr	Controls the output format: `'q'` or `'p'`.

Example

import qutePandas as qpd
res = qpd.groupby_sum(df, by_cols='category', sum_col='revenue')

groupby_avg(df, by_cols, avg_col, return_type='q') df.groupby().mean()

Computes the arithmetic mean for grouped records. By utilizing kdb+'s native vector speed, this operation can process millions of groups per second, far exceeding the performance of traditional row-based row iterators.

Parameters

Name	Description
by_colsstr or list	One or more columns to use as grouping keys.
avg_colstr	The numeric column for which to calculate the average.
return_typestr	Controls the output format: `'q'` or `'p'`.

Example

import qutePandas as qpd
res = qpd.groupby_avg(df, by_cols=['region', 'year'], avg_col='sales')

Joining

merge(left, right, how='inner', on=None, left_on=None, right_on=None, return_type='q') pd.merge()

Unifies multiple join strategies (inner, left, right, outer) into a single pandas-compliant interface. This operation leverages kdb+'s specialized join primitives (e.g., ij, lj, uj) to perform high-speed table intersections and unions without leaving the vector engine.

Parameters

Name	Description
left, rightSource	The two qutePandas or PyKX tables to be merged.
howstr	Type of merge: `'inner'`, `'left'`, `'right'`, or `'outer'`.
onstr or list	Column name(s) to join on. Must be found in both tables.
left_on, right_onstr or list	Specific column names to join on in the left and right tables respectively.
return_typestr	Controls the output format: `'q'` or `'p'`.

Example

import qutePandas as qpd
# Inner join on common key
res = qpd.merge(df_a, df_b, on='id', how='inner')

# Left join with different key names
res = qpd.merge(df_a, df_b, left_on='user_id', right_on='uid', how='left')

I/O

from_csv(path, return_type='q') pd.read_csv()

Bulk ingests CSV files directly into kdb+ memory-mapped space. This method is specifically designed to bypass Python's memory limits, allowing for the rapid analysis of multi-gigabyte datasets that would otherwise trigger Out-Of-Memory (OOM) errors in standard Pandas.

Parameters

Name	Description
pathstr	The filesystem path to the target CSV file.
return_typestr	Controls the output format: `'q'` or `'p'`.

Example

import qutePandas as qpd
df = qpd.from_csv('historical_data.csv')

to_csv(df, path) df.to_csv()

Serializes kdb+ tables to standard CSV format. This utility ensures cross-system interoperability, making it easy to share results with stakeholders or downstream systems that do not have a kdb+ runtime.

Parameters

Name	Description
dfSource	The table to be exported.
pathstr	The target file path for the exported CSV.

Example

import qutePandas as qpd
qpd.to_csv(df, 'processed_results.csv')

Apply

apply(df, func, axis=0, return_type='q') df.apply()

Executes arbitrary Python logic on the dataset. This is used for complex domain transformations that cannot be elegantly expressed via native kdb+ primitives. While highly flexible, users should prefer axis=0 (columnar) for performance, as row-wise operations incur significant overhead.

Parameters

Name	Description
funccallable	The Python function or lambda to apply.
axisint	`0` for columnar application (efficient), `1` for row-wise application (slow fallback).
return_typestr	Controls the output format: `'q'` or `'p'`.

Example

import qutePandas as qpd
res = qpd.apply(df, func=lambda x: x * 1.05, axis=0)

apply_col(df, col, func, return_type='q') df[col].apply()

Transforms a single targeted column using granular Logic. This allows for specific data cleaning or feature extraction on a per-column basis while maintaining high table-wide performance for all other attributes.

Parameters

Name	Description
colstr	The name of the column to transform.
funccallable	The function to apply to each element of the specified column.
return_typestr	Controls the output format: `'q'` or `'p'`.

Example

import qutePandas as qpd
res = qpd.apply_col(df, col='price', func=lambda x: round(x, 2))

Introspection

dtypes(df, return_type='q') df.dtypes

Retrieves the underlying data types for all columns in the dataset. This uses kdb+'s meta primitive to expose the internal representation (e.g., symbol, float, timestamp) of the data as managed by the vector engine.

Parameters

Name	Description
dfSource	The input qutePandas or PyKX Table.
return_typestr	Controls the output format: `'q'` or `'p'`.

Example

import qutePandas as qpd
schema = qpd.dtypes(df)