API Reference
A comprehensive technical guide to the qutePandas public API. This library translates standard Pandas operations into optimized kdb+ primitives, offloading heavy computations to a high-performance vector engine while maintaining the developer experience of the Python data ecosystem.
Core
DataFrame(data, columns=None)
pd.DataFrame()
Initialize a kdb+ memory-mapped table from standard Python objects. This constructor bridges the
gap between row-major Python memory and columnar kdb+ performance, materializing data
immediately into the backend. For out-of-core datasets larger than RAM, consider using
Parameters
from_csv with memory mapping.
| Name | Description |
|---|---|
| datadict, list, or pd.DataFrame | Source data. Dictionary keys become column headers; nested lists are treated as row records. Existing DataFrames are efficiently mirrored into the kdb+ workspace. |
| columnslist of str | Explicit schema definition. This is required when the input data is a list
without intrinsic header information. |
Example
import qutePandas as qpd
data = {'A': [1, 2], 'B': [3, 4]}
df = qpd.DataFrame(data)
connect(license_path=None)
N/A
Global initializer for the kdb+ runtime environment. This function verifies license validity and
manages critical environment variables such as
Parameters
QLIC and QHOME, which
are required for embedded PyKX operations. It must be called once at the start of any session.
| Name | Description |
|---|---|
| license_pathstr | Optional explicit path to a kc.lic or k4.lic file. If omitted,
the library searches standard paths and project-root folders. |
Example
import qutePandas as qpd
qpd.connect()
install_license(content, is_base64=True)
N/A
Programmatically installs a kdb+ license token or file content. This is particularly useful for
containerized or ephemeral environments (such as CI/CD pipelines) where environment variables
are preferred over physical file deployments.
Parameters
| Name | Description |
|---|---|
| contentstr | The base64 encoded license token or raw content of a license file. |
| is_base64bool | Specifies if the content string is base64 encoded. Defaults to
True.
|
Example
import qutePandas as qpd
qpd.install_license("YOUR_LICENSE_CONTENT")
print(obj, head=None, tail=None)
print(df.head()) / print(df.tail())
Display a kdb+ table with formatted output including borders and column alignment.
Avoids converting to pandas for better performance.
Parameters
| Name | Description |
|---|---|
| objpykx.Table or pykx.KeyedTable | The table to display. |
| headint | Number of rows from the beginning to display. If specified, tail is
ignored. |
| tailint | Number of rows from the end to display. |
Example
import qutePandas as qpd
df = qpd.DataFrame({'name': ['Alice', 'Bob'], 'age': [25, 30]})
qpd.print(df, head=5)
qpd.print(df, tail=3)
Indexing & Selection
loc(df, rows=None, cols=None, return_type='q')
df.loc[rows, cols]
Pure label-location based indexing for selection by label (or boolean array).
Currently supports filtering rows via boolean mask.
Parameters
| Name | Description |
|---|---|
| rowslist or pykx.BooleanVector | Boolean mask or labels for row selection. |
| colsstr or list of str | Column names to select. |
| return_typestr | Controls the output format: 'q' or 'p'. |
Example
import qutePandas as qpd
mask = qpd.kx.q('>', df['price'], 100)
df_subset = qpd.loc(df, rows=mask, cols=['symbol', 'volume'])
iloc(df, rows=None, cols=None, return_type='q')
df.iloc[rows, cols]
Pure integer-location based indexing for selection by position.
Parameters
| Name | Description |
|---|---|
| rowsint, list, or slice | Row indices to select. |
| colsint, list, or slice | Column indices to select. |
| return_typestr | Controls the output format: 'q' or 'p'. |
Example
import qutePandas as qpd
# Select first 10 rows and first 2 columns
df_subset = qpd.iloc(df, rows=slice(0, 10), cols=slice(0, 2))
Cleaning
dropna(df, return_type='q')
df.dropna()
Eliminates any records containing null values across the entire dataset. This operation
leverages kdb+'s efficient vector null-checking to prune incomplete data, preventing the
propagation of nulls in downstream analytical models.
Parameters
| Name | Description |
|---|---|
| dfSource | The input qutePandas or PyKX Table to be cleaned. |
| return_typestr | Controls the output format: 'q' for maximum performance or 'p'
for immediate conversion back to Pandas. |
Example
import qutePandas as qpd
df_clean = qpd.dropna(df)
dropna_col(df, col, return_type='q')
df.dropna(subset=[col])
Targeted null removal focused on a specific column. This is essential for datasets where some
attributes are permitted to be null, but core identifier or feature columns must remain complete
for valid computation.
Parameters
| Name | Description |
|---|---|
| colstr | The target column to scan for nulls. The library handles various null representations
(e.g., 0Nh for integers or "" for strings) automatically based
on the column's underlying kdb+ type. |
| return_typestr | Controls the output format: 'q' or 'p'. |
Example
import qutePandas as qpd
df_clean = qpd.dropna_col(df, col='price')
fillna(df, col, value, return_type='q')
df.fillna()
Replaces null entries with a specified constant value. This enabling "zero-filling" or default
assignment in feature engineering pipelines without creating expensive memory copies, using
kdb+'s native
Parameters
^ (fill) operator.
| Name | Description |
|---|---|
| colstr | The column to apply the fill operation to. |
| valuescalar | The replacement value. Python strings are automatically converted to kdb+ symbols where appropriate for optimized storage. |
| return_typestr | Controls the output format: 'q' or 'p'. |
Example
import qutePandas as qpd
df_filled = qpd.fillna(df, col='volume', value=0)
remove_duplicates(df, return_type='q')
df.drop_duplicates()
Retains only the first occurrence of unique rows in the dataset, discarding all subsequent
duplicates. This is optimized using kdb+'s
Parameters
distinct primitive, making it
exceptionally fast for de-duplicating high-volume tick data or batch audit logs.
| Name | Description |
|---|---|
| return_typestr | Controls the output format: 'q' or 'p'. |
Example
import qutePandas as qpd
df_unique = qpd.remove_duplicates(df)
Transformation
cast(df, col, dtype, return_type='q')
df.astype()
Changes the data type of a specific column. This is essential for correcting inference errors
from raw data loads or optimizing storage for keyed lookups and joins. Supported types include
standard numeric, text, and temporal primitives mapping directly to kdb+ internal types.
Parameters
| Name | Description |
|---|---|
| colstr | The name of the target column for type conversion. |
| dtypestr | The target data type (e.g., 'int', 'symbol',
'float', 'timestamp').
|
| return_typestr | Controls the output format: 'q' or 'p'. |
Example
import qutePandas as qpd
df_cast = qpd.cast(df, col='userId', dtype='int')
Supported Type Mappings
Numeric: int, long, real, float
Text: string, symbol
KDB Primitives: i, j, f, s, c
rename(df, columns, return_type='q')
df.rename()
Updates column headers using a dictionary mapping. This operation standardizes schemas across
disparate data sources with zero data movement, as it modifies the table metadata directly in
the kdb+ workspace.
Parameters
| Name | Description |
|---|---|
| columnsdict | A dictionary mapping old column names to new ones, e.g.,
{'old_name': 'new_name'}.
|
| return_typestr | Controls the output format: 'q' or 'p'. |
Example
import qutePandas as qpd
df_renamed = qpd.rename(df, columns={'old_col': 'new_col'})
drop_col(df, cols, return_type='q')
df.drop()
Removes unwanted columns from the dataset to reduce the memory footprint and simplify the schema
before performing high-compute operations like joins or exports.
Parameters
| Name | Description |
|---|---|
| colsstr or list | A single column name or a list of names to be discarded from the table. |
| return_typestr | Controls the output format: 'q' or 'p'. |
Example
import qutePandas as qpd
df_reduced = qpd.drop_col(df, cols=['metadata', 'debug_flag'])
Grouping
groupby_sum(df, by_cols, sum_col, return_type='q')
df.groupby().sum()
Aggregates records by key columns and computes the sum of a target numeric column. This
leverages kdb+'s high-volume aggregation engine, which is particularly efficient for time-series
and financial transaction datasets.
Parameters
| Name | Description |
|---|---|
| by_colsstr or list | One or more columns to use as grouping keys. |
| sum_colstr | The numeric column to be aggregated. |
| return_typestr | Controls the output format: 'q' or 'p'. |
Example
import qutePandas as qpd
res = qpd.groupby_sum(df, by_cols='category', sum_col='revenue')
groupby_avg(df, by_cols, avg_col, return_type='q')
df.groupby().mean()
Computes the arithmetic mean for grouped records. By utilizing kdb+'s native vector speed, this
operation can process millions of groups per second, far exceeding the performance of
traditional row-based row iterators.
Parameters
| Name | Description |
|---|---|
| by_colsstr or list | One or more columns to use as grouping keys. |
| avg_colstr | The numeric column for which to calculate the average. |
| return_typestr | Controls the output format: 'q' or 'p'. |
Example
import qutePandas as qpd
res = qpd.groupby_avg(df, by_cols=['region', 'year'], avg_col='sales')
Joining
merge(left, right, how='inner', on=None, left_on=None,
right_on=None, return_type='q')
pd.merge()
Unifies multiple join strategies (inner, left, right, outer) into a single pandas-compliant
interface. This operation leverages kdb+'s specialized join primitives (e.g.,
Parameters
ij,
lj, uj) to perform high-speed table intersections and unions without
leaving the vector engine.
| Name | Description |
|---|---|
| left, rightSource | The two qutePandas or PyKX tables to be merged. |
| howstr | Type of merge: 'inner', 'left', 'right', or
'outer'. |
| onstr or list | Column name(s) to join on. Must be found in both tables. |
| left_on, right_onstr or list | Specific column names to join on in the left and right tables respectively. |
| return_typestr | Controls the output format: 'q' or 'p'. |
Example
import qutePandas as qpd
# Inner join on common key
res = qpd.merge(df_a, df_b, on='id', how='inner')
# Left join with different key names
res = qpd.merge(df_a, df_b, left_on='user_id', right_on='uid', how='left')
I/O
from_csv(path, return_type='q')
pd.read_csv()
Bulk ingests CSV files directly into kdb+ memory-mapped space. This method is specifically
designed to bypass Python's memory limits, allowing for the rapid analysis of multi-gigabyte
datasets that would otherwise trigger Out-Of-Memory (OOM) errors in standard Pandas.
Parameters
| Name | Description |
|---|---|
| pathstr | The filesystem path to the target CSV file. |
| return_typestr | Controls the output format: 'q' or 'p'. |
Example
import qutePandas as qpd
df = qpd.from_csv('historical_data.csv')
to_csv(df, path)
df.to_csv()
Serializes kdb+ tables to standard CSV format. This utility ensures cross-system
interoperability, making it easy to share results with stakeholders or downstream systems that
do not have a kdb+ runtime.
Parameters
| Name | Description |
|---|---|
| dfSource | The table to be exported. |
| pathstr | The target file path for the exported CSV. |
Example
import qutePandas as qpd
qpd.to_csv(df, 'processed_results.csv')
Apply
apply(df, func, axis=0, return_type='q')
df.apply()
Executes arbitrary Python logic on the dataset. This is used for complex domain transformations
that cannot be elegantly expressed via native kdb+ primitives. While highly flexible, users
should prefer
Parameters
axis=0 (columnar) for performance, as row-wise operations incur
significant overhead.
| Name | Description |
|---|---|
| funccallable | The Python function or lambda to apply. |
| axisint | 0 for columnar application (efficient), 1 for row-wise
application (slow fallback). |
| return_typestr | Controls the output format: 'q' or 'p'. |
Example
import qutePandas as qpd
res = qpd.apply(df, func=lambda x: x * 1.05, axis=0)
apply_col(df, col, func, return_type='q')
df[col].apply()
Transforms a single targeted column using granular Logic. This allows for specific data cleaning
or feature extraction on a per-column basis while maintaining high table-wide performance for
all other attributes.
Parameters
| Name | Description |
|---|---|
| colstr | The name of the column to transform. |
| funccallable | The function to apply to each element of the specified column. |
| return_typestr | Controls the output format: 'q' or 'p'. |
Example
import qutePandas as qpd
res = qpd.apply_col(df, col='price', func=lambda x: round(x, 2))
Introspection
dtypes(df, return_type='q')
df.dtypes
Retrieves the underlying data types for all columns in the dataset. This uses kdb+'s
Parameters
meta primitive to expose the internal representation (e.g., symbol, float,
timestamp) of the data as managed by the vector engine.
| Name | Description |
|---|---|
| dfSource | The input qutePandas or PyKX Table. |
| return_typestr | Controls the output format: 'q' or 'p'. |
Example
import qutePandas as qpd
schema = qpd.dtypes(df)