Skip to content

plot.histogram.bins ignored for low-cardinality numeric features even when low_categorical_threshold = 0 #1753

@JSBaxter

Description

@JSBaxter

Current Behaviour

Even when explicitly configuring plot.histogram.bins and setting vars.num.low_categorical_threshold = 0, YData Profiling fails to respect the number of bins for numeric features with low cardinality (e.g. [0.2, 0.4, ..., 1.0]).

This results in bar plots or underspecified histograms, despite numeric treatment being forced. This appears to be a bug, as user configuration should override internal heuristics. The behaviour appears to be due to the following line:

bins_arg = "auto" if hist_config.bins == 0 else min(hist_config.bins, n_unique)

Expected Behaviour

I would expect that when the number of bins is set explicitly by the user that this is respected.

Alternatively there should at least be some additional configuration variable bins_override which allows the user to explicitly state the number of bins and override heuristics.

Instead of:

bins_arg = "auto" if hist_config.bins == 0 else min(hist_config.bins, n_unique)

it would be nice to have something like:

if hist_config.bins_override: 
    bins_arg = hist_config.bins_override
else:
    bins_arg = "auto" if hist_config.bins == 0 else min(hist_config.bins, n_unique)

Data Description

Any numeric dataset with cardinality lower than the number of bins

Code that reproduces the bug

import pandas as pd
from ydata_profiling import ProfileReport

df = pd.DataFrame({"x": [0.2, 0.4, 0.6, 0.8, 1.0] * 20})

profile = ProfileReport(
    df,
    explorative=True
)
profile.config.vars.num.low_categorical_threshold = 0
profile.config.plot.histogram.bins = 30

profile.to_file("report.html")

pandas-profiling version

v4.16.1

Dependencies

ydata-profiling==4.16.1

OS

No response

Checklist

  • There is not yet another bug report for this issue in the issue tracker
  • The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
  • The issue has not been resolved by the entries listed under Common Issues.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions