Error in prepared DataLoader with BatchSampler

### System Info

```Shell
accelerate: 0.12.0
OS: Linux 5.4.188+ (Colab)
Python: 3.7.13
numpy: 1.21.6
torch: 1.12.1+cu113
config: 1 CPU
```


### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported `no_trainer` script in the `examples` folder of the `transformers` repo (such as `run_no_trainer_glue.py`)
- [X] My own task or dataset (give details below)

### Reproduction

MRE : https://colab.research.google.com/drive/17krCJCF_nWtNFSiMBo3oz12l7eX1bBZ6

First of all, thanks for this library and the great docs and examples that comes with it 😄!

I am using a custom torch Dataset that contains a Hugging Face Dataset (pyarrow) instance. Therefore, as indicated in the Datasets docs (https://huggingface.co/docs/datasets/v2.4.0/en/use_with_pytorch#use-a-batchsampler), I tried to use a BatchSampler to reduce the number of queries. However, I have not been able yet to make it work yet with accelerate.

I tried many different possibilities, one of which works one CPU or one GPU, but gets stuck when using distributed training.

Thanks for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error in prepared DataLoader with BatchSampler #679

System Info

Information

Tasks

Reproduction

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error in prepared DataLoader with BatchSampler #679

Description

System Info

Information

Tasks

Reproduction

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions