-
Notifications
You must be signed in to change notification settings - Fork 258
[Torch FX] Compress PT2E Support #3663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
model: torch.fx.GraphModule, | ||
quantizer: Quantizer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please apply linters
…match in signatures in prepare_pt2e.
src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py
Show resolved
Hide resolved
src/nncf/experimental/torch/fx/quantization/quantizer/__init__.py
Outdated
Show resolved
Hide resolved
tests/executorch/observers.py
Outdated
|
||
from abc import ABC | ||
from abc import abstractmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please import observers from the Executorch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I see the PR with OpenVINOQuantizer?
from nncf.quantization.algorithms.weight_compression.algorithm import WeightCompression | ||
|
||
|
||
class WeightsCompressionPT2E(Algorithm): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This algorithm is not designed for the PT2E, this is experimental WC algorithm which could be implemented in any backend
class WeightsCompressionPT2E(Algorithm): | |
class WeightCompression(Algorithm): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I rename it to ExperimentalWeightCompression
instead? since it could be confused with the original
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is inside the experimental directory, that should be descriptive enough. I suggest the WeightCompression
name
|
||
import torch | ||
|
||
import nncf # type: ignore[import-untyped] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why # type: ignore[import-untyped]
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to update the typehint ignores since I copied them off my scripts
) -> torch.fx.GraphModule: | ||
self._quantizer = quantizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typehints an docstring are missing
model, | ||
parameters={ | ||
"mode": self._mode.value, | ||
"mode": self._mode.value if not isinstance(self._mode, str) else self._mode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the str mode here? Can we force self._mode
to always be an ENUM param?
if self._sensitivity_metric == nncf.SensitivityMetric.WEIGHT_QUANTIZATION_ERROR: | ||
# Default case. It means that it is not defined by the user in the API | ||
# Hence, the annotation(Quantization parameters for all layers) from the quantizer will be used. | ||
all_weight_params = self._quantizer.get_weight_compression_setup( | ||
model, graph | ||
) # Get weight compression params FROM QUANTIZER | ||
statistics, statistic_points = self._algo.collect_weight_compression_statistics( | ||
model, graph, dataset, all_weight_params, statistic_points | ||
) | ||
else: | ||
# Data Aware mixed precision is used. In this case, only nodes_to_compress is obtained from the quantizer | ||
nodes_to_compress = self._quantizer.get_nodes_to_compress( | ||
model, graph | ||
) # Get nodes to compress FROM QUANTIZER | ||
all_weight_params, statistics = self._algo.get_weight_compression_parameters( | ||
model, graph, nodes_to_compress, statistic_points, dataset | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really puzzling, please share the OpenVINOQuantizer with me
…f user passed it or not; Remove init file for tests/executorch; remove init file from nncf openvino quantizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just pass the dataset param to the quantizer.get_nncf_weight_compression_parameters
and simplify the pipeline? With that, we don't need get_nodes_to_compress
and collect_weight_compression_statistics
methods in WC algorithm
def get_quantization_setup(self, model: torch.fx.GraphModule, nncf_graph: NNCFGraph) -> SingleConfigQuantizerSetup: | ||
return self._quantizer.get_nncf_quantization_setup(model, nncf_graph) | ||
|
||
def get_weight_compression_setup( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not use the word setup
in context of WC
def get_weight_compression_setup( | |
def get_weight_compression_params( |
tests/executorch/test_quantizer.py
Outdated
def get_nncf_weight_compression_setup( | ||
self, model: torch.fx.GraphModule, nncf_graph: NNCFGraph | ||
) -> quantization.quantizer_setup.SingleConfigQuantizerSetup: | ||
nodes_to_compress = self.get_nodes_to_compress(model, nncf_graph) | ||
return self._algo.get_weight_compression_parameters(model, nncf_graph, nodes_to_compress)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_nncf_weight_compression_setup( | |
self, model: torch.fx.GraphModule, nncf_graph: NNCFGraph | |
) -> quantization.quantizer_setup.SingleConfigQuantizerSetup: | |
nodes_to_compress = self.get_nodes_to_compress(model, nncf_graph) | |
return self._algo.get_weight_compression_parameters(model, nncf_graph, nodes_to_compress)[0] | |
def get_nncf_weight_compression_parameters( | |
self, model: torch.fx.GraphModule, nncf_graph: NNCFGraph, dataset: Dataset | |
) -> tuple[list[WeightCompressionParameters], Optional[dict[str, WCTensorStatistic]]]: | |
return self._algo.get_weight_compression_parameters(model, nncf_graph, dataset=dataset) |
if self._sensitivity_metric is None: | ||
# Default case. It means that it is not defined by the user in the API | ||
# Hence, the annotation(Quantization parameters for all layers) from the quantizer will be used. | ||
all_weight_params = self._quantizer.get_weight_compression_setup( | ||
model, graph | ||
) # Get weight compression params FROM QUANTIZER | ||
statistics, statistic_points = self._algo.collect_weight_compression_statistics( | ||
model, graph, dataset, all_weight_params, statistic_points | ||
) | ||
else: | ||
# Data Aware mixed precision is used. In this case, only nodes_to_compress is obtained from the quantizer | ||
nodes_to_compress = self._quantizer.get_nodes_to_compress( | ||
model, graph | ||
) # Get nodes to compress FROM QUANTIZER | ||
all_weight_params, statistics = self._algo.get_weight_compression_parameters( | ||
model, graph, nodes_to_compress, statistic_points, dataset | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self._sensitivity_metric is None: | |
# Default case. It means that it is not defined by the user in the API | |
# Hence, the annotation(Quantization parameters for all layers) from the quantizer will be used. | |
all_weight_params = self._quantizer.get_weight_compression_setup( | |
model, graph | |
) # Get weight compression params FROM QUANTIZER | |
statistics, statistic_points = self._algo.collect_weight_compression_statistics( | |
model, graph, dataset, all_weight_params, statistic_points | |
) | |
else: | |
# Data Aware mixed precision is used. In this case, only nodes_to_compress is obtained from the quantizer | |
nodes_to_compress = self._quantizer.get_nodes_to_compress( | |
model, graph | |
) # Get nodes to compress FROM QUANTIZER | |
all_weight_params, statistics = self._algo.get_weight_compression_parameters( | |
model, graph, nodes_to_compress, statistic_points, dataset | |
) | |
all_weight_params, statistics = self._quantizer.get_nncf_weight_compression_parameters(..., dataset=dataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we might not be able to achieve our 2 goals:
- data aware mixed precision if sensitivity is defined by the user.
- data free mixed precision if sensitivity is not.
In the case of a external quantizer, we will not be able to call the mixed precision algorithm inside the quantizer adapter to return weight compression params for mixed precision.
all_weight_params, statistics = self.get_weight_compression_parameters( | ||
model, graph, nodes_to_compress, statistic_points, dataset | ||
) | ||
transformed_model = self.apply_wc_algos(model, graph, all_weight_params, statistics, dataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please come up with a docstring as well
transformed_model = self.apply_wc_algos(model, graph, all_weight_params, statistics, dataset) | |
transformed_model = self.apply_with_compression_parameters(model, graph, all_weight_params, statistics, dataset) |
Changes
Introduced a new API to offer weights compression algorithm for quantizers defined in torch.ao.
Currently only supports OpenVINO Quantizer.
Reason for changes
To support Quantizers defined in torch ao.
Related tickets
169342