Skip to content

Model Loading #246

@ArijitSinghEDA

Description

@ArijitSinghEDA

I am loading model like this

import sparknlp
import nlu

spark = sparknlp.start()
df = spark.read.csv("nlp_data.csv")
res = nlu.load("pos").predict(df[["text"]].rdd.flatMap(lambda x: x).collect())
print(res)
spark.stop()

Each time I get the following messages in my console:

com.johnsnowlabs.nlp#spark-nlp_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-3f17e4b8-0bdf-40c5-9879-d62f9c2dc974;1.0
        confs: [default]
        found com.johnsnowlabs.nlp#spark-nlp_2.12;5.2.3 in central
        found com.typesafe#config;1.4.2 in central
        found org.rocksdb#rocksdbjni;6.29.5 in central
        found com.amazonaws#aws-java-sdk-s3;1.12.500 in central
        found com.amazonaws#aws-java-sdk-kms;1.12.500 in central
        found com.amazonaws#aws-java-sdk-core;1.12.500 in central
        found commons-logging#commons-logging;1.1.3 in central
        found commons-codec#commons-codec;1.15 in central
        found org.apache.httpcomponents#httpclient;4.5.13 in central
        found org.apache.httpcomponents#httpcore;4.4.13 in central
        found software.amazon.ion#ion-java;1.0.2 in central
        found com.fasterxml.jackson.dataformat#jackson-dataformat-cbor;2.12.6 in central
        found joda-time#joda-time;2.8.1 in central
        found com.amazonaws#jmespath-java;1.12.500 in central
        found com.github.universal-automata#liblevenshtein;3.0.0 in central
        found com.google.protobuf#protobuf-java-util;3.0.0-beta-3 in central
        found com.google.protobuf#protobuf-java;3.0.0-beta-3 in central
        found com.google.code.gson#gson;2.3 in central
        found it.unimi.dsi#fastutil;7.0.12 in central
        found org.projectlombok#lombok;1.16.8 in central
        found com.google.cloud#google-cloud-storage;2.20.1 in central
        found com.google.guava#guava;31.1-jre in central
        found com.google.guava#failureaccess;1.0.1 in central
        found com.google.guava#listenablefuture;9999.0-empty-to-avoid-conflict-with-guava in central
        found com.google.errorprone#error_prone_annotations;2.18.0 in central
        found com.google.j2objc#j2objc-annotations;1.3 in central
        found com.google.http-client#google-http-client;1.43.0 in central
        found io.opencensus#opencensus-contrib-http-util;0.31.1 in central
        found com.google.http-client#google-http-client-jackson2;1.43.0 in central
        found com.google.http-client#google-http-client-gson;1.43.0 in central
        found com.google.api-client#google-api-client;2.2.0 in central
        found com.google.oauth-client#google-oauth-client;1.34.1 in central
        found com.google.http-client#google-http-client-apache-v2;1.43.0 in central
        found com.google.apis#google-api-services-storage;v1-rev20220705-2.0.0 in central
        found com.google.code.gson#gson;2.10.1 in central
        found com.google.cloud#google-cloud-core;2.12.0 in central
        found io.grpc#grpc-context;1.53.0 in central
        found com.google.auto.value#auto-value-annotations;1.10.1 in central
        found com.google.auto.value#auto-value;1.10.1 in central
        found javax.annotation#javax.annotation-api;1.3.2 in central
        found com.google.cloud#google-cloud-core-http;2.12.0 in central
        found com.google.http-client#google-http-client-appengine;1.43.0 in central
        found com.google.api#gax-httpjson;0.108.2 in central
        found com.google.cloud#google-cloud-core-grpc;2.12.0 in central
        found io.grpc#grpc-alts;1.53.0 in central
        found io.grpc#grpc-grpclb;1.53.0 in central
        found org.conscrypt#conscrypt-openjdk-uber;2.5.2 in central
        found io.grpc#grpc-auth;1.53.0 in central
        found io.grpc#grpc-protobuf;1.53.0 in central
        found io.grpc#grpc-protobuf-lite;1.53.0 in central
        found io.grpc#grpc-core;1.53.0 in central
        found com.google.api#gax;2.23.2 in central
        found com.google.api#gax-grpc;2.23.2 in central
        found com.google.auth#google-auth-library-credentials;1.16.0 in central
        found com.google.auth#google-auth-library-oauth2-http;1.16.0 in central
        found com.google.api#api-common;2.6.2 in central
        found io.opencensus#opencensus-api;0.31.1 in central
        found com.google.api.grpc#proto-google-iam-v1;1.9.2 in central
        found com.google.protobuf#protobuf-java;3.21.12 in central
        found com.google.protobuf#protobuf-java-util;3.21.12 in central
        found com.google.api.grpc#proto-google-common-protos;2.14.2 in central
        found org.threeten#threetenbp;1.6.5 in central
        found com.google.api.grpc#proto-google-cloud-storage-v2;2.20.1-alpha in central
        found com.google.api.grpc#grpc-google-cloud-storage-v2;2.20.1-alpha in central
        found com.google.api.grpc#gapic-google-cloud-storage-v2;2.20.1-alpha in central
        found com.fasterxml.jackson.core#jackson-core;2.14.2 in central
        found com.google.code.findbugs#jsr305;3.0.2 in central
        found io.grpc#grpc-api;1.53.0 in central
        found io.grpc#grpc-stub;1.53.0 in central
        found org.checkerframework#checker-qual;3.31.0 in central
        found io.perfmark#perfmark-api;0.26.0 in central
        found com.google.android#annotations;4.1.1.4 in central
        found org.codehaus.mojo#animal-sniffer-annotations;1.22 in central
        found io.opencensus#opencensus-proto;0.2.0 in central
        found io.grpc#grpc-services;1.53.0 in central
        found com.google.re2j#re2j;1.6 in central
        found io.grpc#grpc-netty-shaded;1.53.0 in central
        found io.grpc#grpc-googleapis;1.53.0 in central
        found io.grpc#grpc-xds;1.53.0 in central
        found com.navigamez#greex;1.0 in central
        found dk.brics.automaton#automaton;1.11-8 in central
        found com.johnsnowlabs.nlp#tensorflow-cpu_2.12;0.4.4 in central
        found com.microsoft.onnxruntime#onnxruntime;1.16.3 in central
:: resolution report :: resolve 1966ms :: artifacts dl 54ms
        :: modules in use:
        com.amazonaws#aws-java-sdk-core;1.12.500 from central in [default]
        com.amazonaws#aws-java-sdk-kms;1.12.500 from central in [default]
        com.amazonaws#aws-java-sdk-s3;1.12.500 from central in [default]
        com.amazonaws#jmespath-java;1.12.500 from central in [default]
        com.fasterxml.jackson.core#jackson-core;2.14.2 from central in [default]
        com.fasterxml.jackson.dataformat#jackson-dataformat-cbor;2.12.6 from central in [default]
        com.github.universal-automata#liblevenshtein;3.0.0 from central in [default]
        com.google.android#annotations;4.1.1.4 from central in [default]
        com.google.api#api-common;2.6.2 from central in [default]
        com.google.api#gax;2.23.2 from central in [default]
        com.google.api#gax-grpc;2.23.2 from central in [default]
        com.google.api#gax-httpjson;0.108.2 from central in [default]
        com.google.api-client#google-api-client;2.2.0 from central in [default]
        com.google.api.grpc#gapic-google-cloud-storage-v2;2.20.1-alpha from central in [default]
        com.google.api.grpc#grpc-google-cloud-storage-v2;2.20.1-alpha from central in [default]
        com.google.api.grpc#proto-google-cloud-storage-v2;2.20.1-alpha from central in [default]
        com.google.api.grpc#proto-google-common-protos;2.14.2 from central in [default]
        com.google.api.grpc#proto-google-iam-v1;1.9.2 from central in [default]
        com.google.apis#google-api-services-storage;v1-rev20220705-2.0.0 from central in [default]
        com.google.auth#google-auth-library-credentials;1.16.0 from central in [default]
        com.google.auth#google-auth-library-oauth2-http;1.16.0 from central in [default]
        com.google.auto.value#auto-value;1.10.1 from central in [default]
        com.google.auto.value#auto-value-annotations;1.10.1 from central in [default]
        com.google.cloud#google-cloud-core;2.12.0 from central in [default]
        com.google.cloud#google-cloud-core-grpc;2.12.0 from central in [default]
        com.google.cloud#google-cloud-core-http;2.12.0 from central in [default]
        com.google.cloud#google-cloud-storage;2.20.1 from central in [default]
        com.google.code.findbugs#jsr305;3.0.2 from central in [default]
        com.google.code.gson#gson;2.10.1 from central in [default]
        com.google.errorprone#error_prone_annotations;2.18.0 from central in [default]
        com.google.guava#failureaccess;1.0.1 from central in [default]
        com.google.guava#guava;31.1-jre from central in [default]
        com.google.guava#listenablefuture;9999.0-empty-to-avoid-conflict-with-guava from central in [default]
        com.google.http-client#google-http-client;1.43.0 from central in [default]
        com.google.http-client#google-http-client-apache-v2;1.43.0 from central in [default]
        com.google.http-client#google-http-client-appengine;1.43.0 from central in [default]
        com.google.http-client#google-http-client-gson;1.43.0 from central in [default]
        com.google.http-client#google-http-client-jackson2;1.43.0 from central in [default]
        com.google.j2objc#j2objc-annotations;1.3 from central in [default]
        com.google.oauth-client#google-oauth-client;1.34.1 from central in [default]
        com.google.protobuf#protobuf-java;3.21.12 from central in [default]
        com.google.protobuf#protobuf-java-util;3.21.12 from central in [default]
        com.google.re2j#re2j;1.6 from central in [default]
        com.johnsnowlabs.nlp#spark-nlp_2.12;5.2.3 from central in [default]
        com.johnsnowlabs.nlp#tensorflow-cpu_2.12;0.4.4 from central in [default]
        com.microsoft.onnxruntime#onnxruntime;1.16.3 from central in [default]
        com.navigamez#greex;1.0 from central in [default]
        com.typesafe#config;1.4.2 from central in [default]
        commons-codec#commons-codec;1.15 from central in [default]
        commons-logging#commons-logging;1.1.3 from central in [default]
        dk.brics.automaton#automaton;1.11-8 from central in [default]
        io.grpc#grpc-alts;1.53.0 from central in [default]
        io.grpc#grpc-api;1.53.0 from central in [default]
        io.grpc#grpc-auth;1.53.0 from central in [default]
        io.grpc#grpc-context;1.53.0 from central in [default]
        io.grpc#grpc-core;1.53.0 from central in [default]
        io.grpc#grpc-googleapis;1.53.0 from central in [default]
        io.grpc#grpc-grpclb;1.53.0 from central in [default]
        io.grpc#grpc-netty-shaded;1.53.0 from central in [default]
        io.grpc#grpc-protobuf;1.53.0 from central in [default]
        io.grpc#grpc-protobuf-lite;1.53.0 from central in [default]
        io.grpc#grpc-services;1.53.0 from central in [default]
        io.grpc#grpc-stub;1.53.0 from central in [default]
        io.grpc#grpc-xds;1.53.0 from central in [default]
        io.opencensus#opencensus-api;0.31.1 from central in [default]
        io.opencensus#opencensus-contrib-http-util;0.31.1 from central in [default]
        io.opencensus#opencensus-proto;0.2.0 from central in [default]
        io.perfmark#perfmark-api;0.26.0 from central in [default]
        it.unimi.dsi#fastutil;7.0.12 from central in [default]
        javax.annotation#javax.annotation-api;1.3.2 from central in [default]
        joda-time#joda-time;2.8.1 from central in [default]
        org.apache.httpcomponents#httpclient;4.5.13 from central in [default]
        org.apache.httpcomponents#httpcore;4.4.13 from central in [default]
        org.checkerframework#checker-qual;3.31.0 from central in [default]
        org.codehaus.mojo#animal-sniffer-annotations;1.22 from central in [default]
        org.conscrypt#conscrypt-openjdk-uber;2.5.2 from central in [default]
        org.projectlombok#lombok;1.16.8 from central in [default]
        org.rocksdb#rocksdbjni;6.29.5 from central in [default]
        org.threeten#threetenbp;1.6.5 from central in [default]
        software.amazon.ion#ion-java;1.0.2 from central in [default]
        :: evicted modules:
        commons-logging#commons-logging;1.2 by [commons-logging#commons-logging;1.1.3] in [default]
        commons-codec#commons-codec;1.11 by [commons-codec#commons-codec;1.15] in [default]
        com.google.protobuf#protobuf-java-util;3.0.0-beta-3 by [com.google.protobuf#protobuf-java-util;3.21.12] in [default]
        com.google.protobuf#protobuf-java;3.0.0-beta-3 by [com.google.protobuf#protobuf-java;3.21.12] in [default]
        com.google.code.gson#gson;2.3 by [com.google.code.gson#gson;2.10.1] in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   85  |   0   |   0   |   5   ||   80  |   0   |
        ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-3f17e4b8-0bdf-40c5-9879-d62f9c2dc974
        confs: [default]
        0 artifacts copied, 80 already retrieved (0kB/27ms)


pos_anc download started this may take some time.
Approximate size to download 3.9 MB
[ / ]pos_anc download started this may take some time.
Approximate size to download 3.9 MB
[ — ]Download done! Loading the resource.
[OK!]
sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[ | ]sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[ / ]Download done! Loading the resource.
[ — ]2024-02-06 14:43:45.340048: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[OK!]

Is it indicating that I am downloading the model(s) from the internet agin and again, or am I downloading it from the jar files?
I assume that the jar files are now on my local system since it took some time when I first installed spark-nlp, and now it just prints the jars information almost immediately when I run the code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions