Working with JSONL tables in Python
JSONL (JSON Lines) file handling with automatic format detection and high-performance data operations.
Installation
Section titled “Installation”pip install fairspecGetting Started
Section titled “Getting Started”The JSONL format is handled by the JSON plugin, which provides:
load_json_table- Load JSONL files into tablessave_json_table- Save tables to JSONL filesJsonPlugin- Plugin for framework integration
For example:
from fairspec import load_json_table, Resource
table = load_json_table(Resource(data="table.jsonl"))# Newline-delimited JSON objectsBasic Usage
Section titled “Basic Usage”Loading JSONL Files
Section titled “Loading JSONL Files”from fairspec import load_json_table, Resourcefrom fairspec_metadata import JsonFileDialect
# Load from local filetable = load_json_table(Resource(data="data.jsonl"))
# Load with explicit formattable = load_json_table(Resource( data="data.jsonl", fileDialect=JsonFileDialect(format="jsonl"),))
# Load multiple files (concatenated)table = load_json_table(Resource(data=["part1.jsonl", "part2.jsonl"]))Saving JSONL Files
Section titled “Saving JSONL Files”from fairspec import save_json_tablefrom fairspec_metadata import JsonFileDialect
# Save as JSONLsave_json_table(table, path="output.jsonl", fileDialect=JsonFileDialect(format="jsonl"))Standard Format
Section titled “Standard Format”JSONL uses newline-delimited JSON objects:
{"id": 1, "name": "Alice", "age": 30}{"id": 2, "name": "Bob", "age": 25}{"id": 3, "name": "Charlie", "age": 35}Advanced Features
Section titled “Advanced Features”Column Selection
Section titled “Column Selection”Select specific columns using columnNames:
from fairspec import load_json_table, Resourcefrom fairspec_metadata import JsonFileDialect
# Only load specific columnstable = load_json_table(Resource( data="data.jsonl", fileDialect=JsonFileDialect(format="jsonl", columnNames=["name", "age"]),))Array Format Handling
Section titled “Array Format Handling”Handle CSV-style array data with rowType: "array":
from fairspec import load_json_table, Resourcefrom fairspec_metadata import JsonFileDialect
# Input JSONL with arrays:# ["id", "name"]# [1, "Alice"]# [2, "Bob"]
table = load_json_table(Resource( data="data.jsonl", fileDialect=JsonFileDialect(format="jsonl", rowType="array"),))Remote File Loading
Section titled “Remote File Loading”from fairspec import load_json_table, Resource
# Load from URLtable = load_json_table(Resource(data="https://example.com/data.jsonl"))
# Load multiple remote filestable = load_json_table(Resource(data=[ "https://api.example.com/logs-2023.jsonl", "https://api.example.com/logs-2024.jsonl",]))