Skip to content

Error in prepared DataLoader with BatchSampler #679

@etiennebeaulac

Description

@etiennebeaulac

System Info

accelerate: 0.12.0
OS: Linux 5.4.188+ (Colab)
Python: 3.7.13
numpy: 1.21.6
torch: 1.12.1+cu113
config: 1 CPU

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

MRE : https://colab.research.google.com/drive/17krCJCF_nWtNFSiMBo3oz12l7eX1bBZ6

First of all, thanks for this library and the great docs and examples that comes with it 😄!

I am using a custom torch Dataset that contains a Hugging Face Dataset (pyarrow) instance. Therefore, as indicated in the Datasets docs (https://huggingface.co/docs/datasets/v2.4.0/en/use_with_pytorch#use-a-batchsampler), I tried to use a BatchSampler to reduce the number of queries. However, I have not been able yet to make it work yet with accelerate.

I tried many different possibilities, one of which works one CPU or one GPU, but gets stuck when using distributed training.

Thanks for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestRequest for a new feature to be added to Accelerate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions