Working with Tabular Data in Python
High-performance data processing and schema validation for tabular data built on Polars (a Rust-based DataFrame library).
Installation
Section titled “Installation”pip install fairspecGetting Started
Section titled “Getting Started”The table package provides core utilities for working with tabular data:
normalize_table- Convert table data to match a schemadenormalize_table- Convert normalized data back to raw formatinfer_table_schema_from_table- Automatically infer schema from table datainspect_table- Get table structure informationquery_table- Query tables using SQL-like syntax
For example:
from fairspec import load_csv_table, infer_table_schema_from_table, Resource
table = load_csv_table(Resource(data="data.csv"))schema = infer_table_schema_from_table(table)Basic Usage
Section titled “Basic Usage”Schema Inference
Section titled “Schema Inference”Automatically infer Table Schema from data:
import polars as plfrom fairspec import infer_table_schema_from_table
table = pl.DataFrame({ "id": ["1", "2", "3"], "price": ["10.50", "25.00", "15.75"], "date": ["2023-01-15", "2023-02-20", "2023-03-25"], "active": ["true", "false", "true"],}).lazy()
schema = infer_table_schema_from_table(table, sample_rows=100, confidence=0.9)
# Result: automatically detected integer, number, date, and boolean typesTable Normalization
Section titled “Table Normalization”Convert table data to match a Table Schema (type conversion):
import polars as plfrom fairspec import normalize_tablefrom fairspec_metadata import TableSchema, IntegerColumnProperty, NumberColumnProperty, BooleanColumnProperty, DateColumnProperty
table = pl.DataFrame({ "id": ["1", "2", "3"], "price": ["10.50", "25.00", "15.75"], "active": ["true", "false", "true"], "date": ["2023-01-15", "2023-02-20", "2023-03-25"],}).lazy()
schema = TableSchema(properties={ "id": IntegerColumnProperty(), "price": NumberColumnProperty(), "active": BooleanColumnProperty(), "date": DateColumnProperty(),})
normalized = normalize_table(table, schema)result = normalized.collect()
# Result has properly typed columns:# { id: 1, price: 10.50, active: True, date: Date("2023-01-15") }Table Denormalization
Section titled “Table Denormalization”Convert normalized data back to raw format (for saving):
from fairspec import denormalize_table
denormalized = denormalize_table(table, schema, native_types=["string", "number", "boolean"])Advanced Features
Section titled “Advanced Features”Working with Table Schema
Section titled “Working with Table Schema”Define schemas with column properties and constraints:
from fairspec_metadata import TableSchema, IntegerColumnProperty, StringColumnProperty
schema = TableSchema( properties={ "id": IntegerColumnProperty(minimum=1), "name": StringColumnProperty(minLength=1, maxLength=100), "email": StringColumnProperty(pattern=r"^[^@]+@[^@]+\.[^@]+$"), "age": IntegerColumnProperty(minimum=0, maximum=150), "status": StringColumnProperty(enum=["active", "inactive", "pending"]), }, required=["id", "name", "email"], primaryKey=["id"],)Schema Inference Options
Section titled “Schema Inference Options”Customize how schemas are inferred:
from fairspec import infer_table_schema_from_table
schema = infer_table_schema_from_table( table, sample_rows=100, confidence=0.9, keep_strings=False, column_types={"id": "integer", "status": "categorical"},)Handling Missing Values
Section titled “Handling Missing Values”Define missing value indicators:
from fairspec_metadata import TableSchema, NumberColumnProperty
schema = TableSchema( properties={"value": NumberColumnProperty()}, missingValues=["", "N/A", "null", -999],)Primary Keys and Constraints
Section titled “Primary Keys and Constraints”Define table-level constraints:
from fairspec_metadata import TableSchema, IntegerColumnProperty, StringColumnProperty, UniqueKey
schema = TableSchema( properties={ "user_id": IntegerColumnProperty(), "email": StringColumnProperty(), }, primaryKey=["user_id"], uniqueKeys=[UniqueKey(columnNames=["email"])],)Supported Column Types
Section titled “Supported Column Types”Primitive Types
Section titled “Primitive Types”string- Text datainteger- Whole numbersnumber- Decimal numbersboolean- True/false values
Temporal Types
Section titled “Temporal Types”date- Calendar datesdatetime- Date and timetime- Time of dayduration- Time spans
Spatial Types
Section titled “Spatial Types”geojson- GeoJSON geometrieswkt- Well-Known Text geometrieswkb- Well-Known Binary geometries
Complex Types
Section titled “Complex Types”array- Fixed-length arrayslist- Variable-length listsobject- JSON objects
Specialized Types
Section titled “Specialized Types”email- Email addressesurl- URLscategorical- Categorical database64- Base64 encoded datahex- Hexadecimal data
Table Type
Section titled “Table Type”The package uses LazyFrame from Polars for efficient processing:
import polars as plfrom fairspec_table import Table
# Table is an alias for pl.LazyFrametable: Table = pl.DataFrame({"id": [1, 2, 3]}).lazy()