-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
P.256 of Generative Deep Learning 2nd Edition - David Foster
https://towardsdatascience.com/how-to-build-an-llm-from-scratch-8c477768f1f9
allenai/allennlp#5056
https://support.terra.bio/hc/en-us/community/posts/4787320149915-Requester-Pays-Google-buckets-not-asking-for-project-to-bill
- Google Project Gemini https://blog.google/technology/ai/google-io-2023-keynote-sundar-pichai/#ai-products
C4 = Colossal Clean Crawled Corpus
start 20231203:0021 - estimate $100 US for gcs egress
An average of 300mbps with peaks of 900mbps from the GCP bucket means 800GB x 8 bits = 6400Gbits at .3Gbps = 6hours ~ ETA
36GB in 26 min = 25MB/sec = 200mbps = 11h (possibly limited by the hdd - go directly to NVMe next time
- checked 0845 done
- copy test HDD to HDD no raid -849 ~ 1330 = 4.5h
- HDD to NVMe 1400-1455 - 250Mbps ~1h
- copy test NVMe to NVMe 1456- 4-8 min 3.4-1.4 GB/s (thermal throttling) (990 pro 50% of max 8GB/s)
E:\c4\c4\en>gsutil -m -u your-project-id cp "gs://allennlp-tensorflow-datasets/c4/en/3.0.1/*" .
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00002-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00013-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00018-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00006-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00015-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00001-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00008-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00017-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00020-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00003-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00009-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00000-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00016-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00021-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00019-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00004-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00010-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00023-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00022-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00007-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00014-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00011-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00005-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-00012-of-01024...
/ [0/1.0k files][ 0.0 B/812.4 GiB] 0% Done
0845
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-train.tfrecord-01023-of-01024...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-validation.tfrecord-00000-of-00008...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-validation.tfrecord-00001-of-00008...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-validation.tfrecord-00002-of-00008...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-validation.tfrecord-00003-of-00008...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-validation.tfrecord-00004-of-00008...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-validation.tfrecord-00005-of-00008...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-validation.tfrecord-00006-of-00008...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/c4-validation.tfrecord-00007-of-00008...
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/dataset_info.json...9
Copying gs://allennlp-tensorflow-datasets/c4/en/3.0.1/features.json...04:29
\ [1.0k/1.0k files][812.4 GiB/812.4 GiB] 100% Done 97.1 MiB/s ETA 00:00:00
Operation completed over 1.0k objects/812.4 GiB.
Metadata
Metadata
Assignees
Labels
No labels