API Reference

A comprehensive technical guide to the qutePandas public API. This library translates standard Pandas operations into optimized kdb+ primitives, offloading heavy computations to a high-performance vector engine while maintaining the developer experience of the Python data ecosystem.

Core

DataFrame(data, columns=None) pd.DataFrame()
Initialize a kdb+ memory-mapped table from standard Python objects. This constructor bridges the gap between row-major Python memory and columnar kdb+ performance, materializing data immediately into the backend. For out-of-core datasets larger than RAM, consider using from_csv with memory mapping.
Parameters
Name Description
datadict, list, or pd.DataFrame Source data. Dictionary keys become column headers; nested lists are treated as row records. Existing DataFrames are efficiently mirrored into the kdb+ workspace.
columnslist of str Explicit schema definition. This is required when the input data is a list without intrinsic header information.
Example
import qutePandas as qpd
data = {'A': [1, 2], 'B': [3, 4]}
df = qpd.DataFrame(data)
connect(license_path=None) N/A
Global initializer for the kdb+ runtime environment. This function verifies license validity and manages critical environment variables such as QLIC and QHOME, which are required for embedded PyKX operations. It must be called once at the start of any session.
Parameters
Name Description
license_pathstr Optional explicit path to a kc.lic or k4.lic file. If omitted, the library searches standard paths and project-root folders.
Example
import qutePandas as qpd
qpd.connect()
install_license(content, is_base64=True) N/A
Programmatically installs a kdb+ license token or file content. This is particularly useful for containerized or ephemeral environments (such as CI/CD pipelines) where environment variables are preferred over physical file deployments.
Parameters
Name Description
contentstr The base64 encoded license token or raw content of a license file.
is_base64bool Specifies if the content string is base64 encoded. Defaults to True.
Example
import qutePandas as qpd
qpd.install_license("YOUR_LICENSE_CONTENT")
print(obj, head=None, tail=None) print(df.head()) / print(df.tail())
Display a kdb+ table with formatted output including borders and column alignment. Avoids converting to pandas for better performance.
Parameters
Name Description
objpykx.Table or pykx.KeyedTable The table to display.
headint Number of rows from the beginning to display. If specified, tail is ignored.
tailint Number of rows from the end to display.
Example
import qutePandas as qpd
df = qpd.DataFrame({'name': ['Alice', 'Bob'], 'age': [25, 30]})
qpd.print(df, head=5)
qpd.print(df, tail=3)

Indexing & Selection

loc(df, rows=None, cols=None, return_type='q') df.loc[rows, cols]
Pure label-location based indexing for selection by label (or boolean array). Currently supports filtering rows via boolean mask.
Parameters
Name Description
rowslist or pykx.BooleanVector Boolean mask or labels for row selection.
colsstr or list of str Column names to select.
return_typestr Controls the output format: 'q' or 'p'.
Example
import qutePandas as qpd
mask = qpd.kx.q('>', df['price'], 100)
df_subset = qpd.loc(df, rows=mask, cols=['symbol', 'volume'])
iloc(df, rows=None, cols=None, return_type='q') df.iloc[rows, cols]
Pure integer-location based indexing for selection by position.
Parameters
Name Description
rowsint, list, or slice Row indices to select.
colsint, list, or slice Column indices to select.
return_typestr Controls the output format: 'q' or 'p'.
Example
import qutePandas as qpd
# Select first 10 rows and first 2 columns
df_subset = qpd.iloc(df, rows=slice(0, 10), cols=slice(0, 2))

Cleaning

dropna(df, return_type='q') df.dropna()
Eliminates any records containing null values across the entire dataset. This operation leverages kdb+'s efficient vector null-checking to prune incomplete data, preventing the propagation of nulls in downstream analytical models.
Parameters
Name Description
dfSource The input qutePandas or PyKX Table to be cleaned.
return_typestr Controls the output format: 'q' for maximum performance or 'p' for immediate conversion back to Pandas.
Example
import qutePandas as qpd
df_clean = qpd.dropna(df)
dropna_col(df, col, return_type='q') df.dropna(subset=[col])
Targeted null removal focused on a specific column. This is essential for datasets where some attributes are permitted to be null, but core identifier or feature columns must remain complete for valid computation.
Parameters
Name Description
colstr The target column to scan for nulls. The library handles various null representations (e.g., 0Nh for integers or "" for strings) automatically based on the column's underlying kdb+ type.
return_typestr Controls the output format: 'q' or 'p'.
Example
import qutePandas as qpd
df_clean = qpd.dropna_col(df, col='price')
fillna(df, col, value, return_type='q') df.fillna()
Replaces null entries with a specified constant value. This enabling "zero-filling" or default assignment in feature engineering pipelines without creating expensive memory copies, using kdb+'s native ^ (fill) operator.
Parameters
Name Description
colstr The column to apply the fill operation to.
valuescalar The replacement value. Python strings are automatically converted to kdb+ symbols where appropriate for optimized storage.
return_typestr Controls the output format: 'q' or 'p'.
Example
import qutePandas as qpd
df_filled = qpd.fillna(df, col='volume', value=0)
remove_duplicates(df, return_type='q') df.drop_duplicates()
Retains only the first occurrence of unique rows in the dataset, discarding all subsequent duplicates. This is optimized using kdb+'s distinct primitive, making it exceptionally fast for de-duplicating high-volume tick data or batch audit logs.
Parameters
Name Description
return_typestr Controls the output format: 'q' or 'p'.
Example
import qutePandas as qpd
df_unique = qpd.remove_duplicates(df)

Transformation

cast(df, col, dtype, return_type='q') df.astype()
Changes the data type of a specific column. This is essential for correcting inference errors from raw data loads or optimizing storage for keyed lookups and joins. Supported types include standard numeric, text, and temporal primitives mapping directly to kdb+ internal types.
Parameters
Name Description
colstr The name of the target column for type conversion.
dtypestr The target data type (e.g., 'int', 'symbol', 'float', 'timestamp').
return_typestr Controls the output format: 'q' or 'p'.
Example
import qutePandas as qpd
df_cast = qpd.cast(df, col='userId', dtype='int')
Supported Type Mappings
Numeric: int, long, real, float
Text: string, symbol
KDB Primitives: i, j, f, s, c
rename(df, columns, return_type='q') df.rename()
Updates column headers using a dictionary mapping. This operation standardizes schemas across disparate data sources with zero data movement, as it modifies the table metadata directly in the kdb+ workspace.
Parameters
Name Description
columnsdict A dictionary mapping old column names to new ones, e.g., {'old_name': 'new_name'}.
return_typestr Controls the output format: 'q' or 'p'.
Example
import qutePandas as qpd
df_renamed = qpd.rename(df, columns={'old_col': 'new_col'})
drop_col(df, cols, return_type='q') df.drop()
Removes unwanted columns from the dataset to reduce the memory footprint and simplify the schema before performing high-compute operations like joins or exports.
Parameters
Name Description
colsstr or list A single column name or a list of names to be discarded from the table.
return_typestr Controls the output format: 'q' or 'p'.
Example
import qutePandas as qpd
df_reduced = qpd.drop_col(df, cols=['metadata', 'debug_flag'])

Grouping

groupby_sum(df, by_cols, sum_col, return_type='q') df.groupby().sum()
Aggregates records by key columns and computes the sum of a target numeric column. This leverages kdb+'s high-volume aggregation engine, which is particularly efficient for time-series and financial transaction datasets.
Parameters
Name Description
by_colsstr or list One or more columns to use as grouping keys.
sum_colstr The numeric column to be aggregated.
return_typestr Controls the output format: 'q' or 'p'.
Example
import qutePandas as qpd
res = qpd.groupby_sum(df, by_cols='category', sum_col='revenue')
groupby_avg(df, by_cols, avg_col, return_type='q') df.groupby().mean()
Computes the arithmetic mean for grouped records. By utilizing kdb+'s native vector speed, this operation can process millions of groups per second, far exceeding the performance of traditional row-based row iterators.
Parameters
Name Description
by_colsstr or list One or more columns to use as grouping keys.
avg_colstr The numeric column for which to calculate the average.
return_typestr Controls the output format: 'q' or 'p'.
Example
import qutePandas as qpd
res = qpd.groupby_avg(df, by_cols=['region', 'year'], avg_col='sales')

Joining

merge(left, right, how='inner', on=None, left_on=None, right_on=None, return_type='q') pd.merge()
Unifies multiple join strategies (inner, left, right, outer) into a single pandas-compliant interface. This operation leverages kdb+'s specialized join primitives (e.g., ij, lj, uj) to perform high-speed table intersections and unions without leaving the vector engine.
Parameters
Name Description
left, rightSource The two qutePandas or PyKX tables to be merged.
howstr Type of merge: 'inner', 'left', 'right', or 'outer'.
onstr or list Column name(s) to join on. Must be found in both tables.
left_on, right_onstr or list Specific column names to join on in the left and right tables respectively.
return_typestr Controls the output format: 'q' or 'p'.
Example
import qutePandas as qpd
# Inner join on common key
res = qpd.merge(df_a, df_b, on='id', how='inner')

# Left join with different key names
res = qpd.merge(df_a, df_b, left_on='user_id', right_on='uid', how='left')

I/O

from_csv(path, return_type='q') pd.read_csv()
Bulk ingests CSV files directly into kdb+ memory-mapped space. This method is specifically designed to bypass Python's memory limits, allowing for the rapid analysis of multi-gigabyte datasets that would otherwise trigger Out-Of-Memory (OOM) errors in standard Pandas.
Parameters
Name Description
pathstr The filesystem path to the target CSV file.
return_typestr Controls the output format: 'q' or 'p'.
Example
import qutePandas as qpd
df = qpd.from_csv('historical_data.csv')
to_csv(df, path) df.to_csv()
Serializes kdb+ tables to standard CSV format. This utility ensures cross-system interoperability, making it easy to share results with stakeholders or downstream systems that do not have a kdb+ runtime.
Parameters
Name Description
dfSource The table to be exported.
pathstr The target file path for the exported CSV.
Example
import qutePandas as qpd
qpd.to_csv(df, 'processed_results.csv')

Apply

apply(df, func, axis=0, return_type='q') df.apply()
Executes arbitrary Python logic on the dataset. This is used for complex domain transformations that cannot be elegantly expressed via native kdb+ primitives. While highly flexible, users should prefer axis=0 (columnar) for performance, as row-wise operations incur significant overhead.
Parameters
Name Description
funccallable The Python function or lambda to apply.
axisint 0 for columnar application (efficient), 1 for row-wise application (slow fallback).
return_typestr Controls the output format: 'q' or 'p'.
Example
import qutePandas as qpd
res = qpd.apply(df, func=lambda x: x * 1.05, axis=0)
apply_col(df, col, func, return_type='q') df[col].apply()
Transforms a single targeted column using granular Logic. This allows for specific data cleaning or feature extraction on a per-column basis while maintaining high table-wide performance for all other attributes.
Parameters
Name Description
colstr The name of the column to transform.
funccallable The function to apply to each element of the specified column.
return_typestr Controls the output format: 'q' or 'p'.
Example
import qutePandas as qpd
res = qpd.apply_col(df, col='price', func=lambda x: round(x, 2))

Introspection

dtypes(df, return_type='q') df.dtypes
Retrieves the underlying data types for all columns in the dataset. This uses kdb+'s meta primitive to expose the internal representation (e.g., symbol, float, timestamp) of the data as managed by the vector engine.
Parameters
Name Description
dfSource The input qutePandas or PyKX Table.
return_typestr Controls the output format: 'q' or 'p'.
Example
import qutePandas as qpd
schema = qpd.dtypes(df)