Skip to content

FR: Infer schema from multiline (multiple docs) JSON files #5603

@livelace

Description

@livelace

datahub version: v0.8.41

Currently there is no chance to infere JSON schema from file that contain muitple lines/docs.

{"id":1, "text": "foo"}
{"id":2, "text": "bar"}

It's a very convenient way to put many JSON docs into a single file. It allows to have well organized and easy managed datasets.

Currently inferring such files produces error:

could not infer schema for file s3://path/to/file.json: ' 'Trailing data']

Slack conversation

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugBug reportingestionPR or Issue related to the ingestion of metadatastale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions