[Torch FX] Compress PT2E Support #3663

anzr299 · 2025-09-22T14:43:32Z

Changes

Introduced a new API to offer weights compression algorithm for quantizers defined in torch.ao.
Currently only supports OpenVINO Quantizer.

Reason for changes

To support Quantizers defined in torch ao.

Related tickets

169342

daniil-lyakhov · 2025-09-22T15:03:50Z

src/nncf/experimental/torch/fx/quantization/quantize_pt2e.py

+                model: torch.fx.GraphModule,
+                quantizer: Quantizer,


Please apply linters

…match in signatures in prepare_pt2e.

…_algo

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

src/nncf/experimental/torch/fx/quantization/quantizer/__init__.py

tests/executorch/__init__.py

tests/executorch/test_quantizer.py

daniil-lyakhov · 2025-09-23T15:18:40Z

tests/executorch/observers.py

+
+from abc import ABC
+from abc import abstractmethod


Please import observers from the Executorch

daniil-lyakhov

Can I see the PR with OpenVINOQuantizer?

daniil-lyakhov · 2025-09-23T15:28:47Z

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

+from nncf.quantization.algorithms.weight_compression.algorithm import WeightCompression
+
+
+class WeightsCompressionPT2E(Algorithm):


This algorithm is not designed for the PT2E, this is experimental WC algorithm which could be implemented in any backend

Suggested change

class WeightsCompressionPT2E(Algorithm):

class WeightCompression(Algorithm):

Should I rename it to ExperimentalWeightCompression instead? since it could be confused with the original

It is inside the experimental directory, that should be descriptive enough. I suggest the WeightCompression name

daniil-lyakhov · 2025-09-23T15:29:23Z

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

+
+import torch
+
+import nncf  # type: ignore[import-untyped]


Why # type: ignore[import-untyped] here?

I need to update the typehint ignores since I copied them off my scripts

daniil-lyakhov · 2025-09-23T15:29:53Z

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

+    ) -> torch.fx.GraphModule:
+        self._quantizer = quantizer


typehints an docstring are missing

daniil-lyakhov · 2025-09-23T15:34:51Z

src/nncf/quantization/algorithms/weight_compression/algorithm.py

            model,
            parameters={
-                "mode": self._mode.value,
+                "mode": self._mode.value if not isinstance(self._mode, str) else self._mode,


What is the str mode here? Can we force self._mode to always be an ENUM param?

daniil-lyakhov · 2025-09-23T15:57:02Z

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

+        if self._sensitivity_metric == nncf.SensitivityMetric.WEIGHT_QUANTIZATION_ERROR:
+            # Default case. It means that it is not defined by the user in the API
+            # Hence, the annotation(Quantization parameters for all layers) from the quantizer will be used.
+            all_weight_params = self._quantizer.get_weight_compression_setup(
+                model, graph
+            )  # Get weight compression params FROM QUANTIZER
+            statistics, statistic_points = self._algo.collect_weight_compression_statistics(
+                model, graph, dataset, all_weight_params, statistic_points
+            )
+        else:
+            # Data Aware mixed precision is used. In this case, only nodes_to_compress is obtained from the quantizer
+            nodes_to_compress = self._quantizer.get_nodes_to_compress(
+                model, graph
+            )  # Get nodes to compress FROM QUANTIZER
+            all_weight_params, statistics = self._algo.get_weight_compression_parameters(
+                model, graph, nodes_to_compress, statistic_points, dataset
+            )


This looks really puzzling, please share the OpenVINOQuantizer with me

…f user passed it or not; Remove init file for tests/executorch; remove init file from nncf openvino quantizer

daniil-lyakhov

Can we just pass the dataset param to the quantizer.get_nncf_weight_compression_parameters and simplify the pipeline? With that, we don't need get_nodes_to_compress and collect_weight_compression_statistics methods in WC algorithm

daniil-lyakhov · 2025-09-24T13:03:24Z

src/nncf/experimental/torch/fx/quantization/quantizer/openvino_adapter.py

    def get_quantization_setup(self, model: torch.fx.GraphModule, nncf_graph: NNCFGraph) -> SingleConfigQuantizerSetup:
        return self._quantizer.get_nncf_quantization_setup(model, nncf_graph)
+
+    def get_weight_compression_setup(


Please do not use the word setup in context of WC

Suggested change

def get_weight_compression_setup(

def get_weight_compression_params(

daniil-lyakhov · 2025-09-24T14:21:19Z

tests/executorch/test_quantizer.py

+    def get_nncf_weight_compression_setup(
+        self, model: torch.fx.GraphModule, nncf_graph: NNCFGraph
+    ) -> quantization.quantizer_setup.SingleConfigQuantizerSetup:
+        nodes_to_compress = self.get_nodes_to_compress(model, nncf_graph)
+        return self._algo.get_weight_compression_parameters(model, nncf_graph, nodes_to_compress)[0]


Suggested change

def get_nncf_weight_compression_setup(

self, model: torch.fx.GraphModule, nncf_graph: NNCFGraph

) -> quantization.quantizer_setup.SingleConfigQuantizerSetup:

nodes_to_compress = self.get_nodes_to_compress(model, nncf_graph)

return self._algo.get_weight_compression_parameters(model, nncf_graph, nodes_to_compress)[0]

def get_nncf_weight_compression_parameters(

self, model: torch.fx.GraphModule, nncf_graph: NNCFGraph, dataset: Dataset

) -> tuple[list[WeightCompressionParameters], Optional[dict[str, WCTensorStatistic]]]:

return self._algo.get_weight_compression_parameters(model, nncf_graph, dataset=dataset)

daniil-lyakhov · 2025-09-24T14:23:23Z

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

+        if self._sensitivity_metric is None:
+            # Default case. It means that it is not defined by the user in the API
+            # Hence, the annotation(Quantization parameters for all layers) from the quantizer will be used.
+            all_weight_params = self._quantizer.get_weight_compression_setup(
+                model, graph
+            )  # Get weight compression params FROM QUANTIZER
+            statistics, statistic_points = self._algo.collect_weight_compression_statistics(
+                model, graph, dataset, all_weight_params, statistic_points
+            )
+        else:
+            # Data Aware mixed precision is used. In this case, only nodes_to_compress is obtained from the quantizer
+            nodes_to_compress = self._quantizer.get_nodes_to_compress(
+                model, graph
+            )  # Get nodes to compress FROM QUANTIZER
+            all_weight_params, statistics = self._algo.get_weight_compression_parameters(
+                model, graph, nodes_to_compress, statistic_points, dataset
+            )


Suggested change

if self._sensitivity_metric is None:

# Default case. It means that it is not defined by the user in the API

# Hence, the annotation(Quantization parameters for all layers) from the quantizer will be used.

all_weight_params = self._quantizer.get_weight_compression_setup(

model, graph

) # Get weight compression params FROM QUANTIZER

statistics, statistic_points = self._algo.collect_weight_compression_statistics(

model, graph, dataset, all_weight_params, statistic_points

)

else:

# Data Aware mixed precision is used. In this case, only nodes_to_compress is obtained from the quantizer

nodes_to_compress = self._quantizer.get_nodes_to_compress(

model, graph

) # Get nodes to compress FROM QUANTIZER

all_weight_params, statistics = self._algo.get_weight_compression_parameters(

model, graph, nodes_to_compress, statistic_points, dataset

)

all_weight_params, statistics = self._quantizer.get_nncf_weight_compression_parameters(..., dataset=dataset)

Here, we might not be able to achieve our 2 goals:

data aware mixed precision if sensitivity is defined by the user.

data free mixed precision if sensitivity is not.
In the case of a external quantizer, we will not be able to call the mixed precision algorithm inside the quantizer adapter to return weight compression params for mixed precision.

daniil-lyakhov · 2025-09-24T14:42:39Z

src/nncf/quantization/algorithms/weight_compression/algorithm.py

+        all_weight_params, statistics = self.get_weight_compression_parameters(
+            model, graph, nodes_to_compress, statistic_points, dataset
+        )
+        transformed_model = self.apply_wc_algos(model, graph, all_weight_params, statistics, dataset)


Please come up with a docstring as well

Suggested change

transformed_model = self.apply_wc_algos(model, graph, all_weight_params, statistics, dataset)

transformed_model = self.apply_with_compression_parameters(model, graph, all_weight_params, statistics, dataset)

anzr299 added 3 commits September 22, 2025 17:22

init

190f9d5

fixes

c52fcca

add message for unsupported external quantizers

4e56cb5

anzr299 requested a review from a team as a code owner September 22, 2025 14:43

github-actions bot added the API Public API-impacting changes label Sep 22, 2025

anzr299 marked this pull request as draft September 22, 2025 14:56

daniil-lyakhov self-requested a review September 22, 2025 15:03

daniil-lyakhov reviewed Sep 22, 2025

View reviewed changes

anzr299 added 19 commits September 22, 2025 19:27

add algorithm

9651ceb

impotr openvino quantizer from nncf instead of executorch

14daeb5

Add observers and openvino quantizer to nncf

3746815

fix

0815dc5

minor fix

1b8d940

fix

7d35374

fix some more bugs; observers was importing from torchao. causing mis…

427ebc2

…match in signatures in prepare_pt2e.

add compress pt2e to init

24dbfb6

fix quantizer init file. Remove extra code.

4bb8c1a

small fix for the big problem:)

8902842

fix quantizer preset definition

3842538

fix openvino quantizer for ptq. call _algo instead of legacy _min_max…

2e70c2e

…_algo

fix quantizer defaults

b1c9aad

microfix

33fe01c

precommit fix

d8e1006

revert openvino quantizer to old

88a8472

create ovquantizer in executorch dir

7a8e51a

update executorch quantizer location.

fed5052

check if openvino quantizer has weight compression in openvino adapter

2866473

daniil-lyakhov requested changes Sep 23, 2025

View reviewed changes

daniil-lyakhov reviewed Sep 23, 2025

View reviewed changes

anzr299 added 10 commits September 24, 2025 09:54

review comments

7171d56

revert ignored scope changes; make sensitivity metric None to check i…

3e3b067

…f user passed it or not; Remove init file for tests/executorch; remove init file from nncf openvino quantizer

precommit fix

5b7b210

pre commit format

71a479f

rename executorch quantizer to test_quantizer

b24a59c

fix last precommit

d12225a

remove unused mypy ignore

9870ee2

get the mode as struct

8015629

fix algorithm

0804218

remove quantizer and observers from nncf. Instead import from executorch

1f1fda3

daniil-lyakhov requested changes Sep 24, 2025

View reviewed changes

daniil-lyakhov reviewed Sep 24, 2025

View reviewed changes

		from nncf.quantization.algorithms.weight_compression.algorithm import WeightCompression


		class WeightsCompressionPT2E(Algorithm):

	class WeightsCompressionPT2E(Algorithm):
	class WeightCompression(Algorithm):

	def get_weight_compression_setup(
	def get_weight_compression_params(

	transformed_model = self.apply_wc_algos(model, graph, all_weight_params, statistics, dataset)
	transformed_model = self.apply_with_compression_parameters(model, graph, all_weight_params, statistics, dataset)

[Torch FX] Compress PT2E Support #3663

Are you sure you want to change the base?

[Torch FX] Compress PT2E Support #3663

Conversation

anzr299 commented Sep 22, 2025

Changes

Reason for changes

Related tickets

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daniil-lyakhov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anzr299 Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daniil-lyakhov Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daniil-lyakhov left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daniil-lyakhov Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daniil-lyakhov Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

anzr299 Sep 24, 2025 •

edited

Loading

daniil-lyakhov Sep 24, 2025 •

edited

Loading

daniil-lyakhov left a comment •

edited

Loading

daniil-lyakhov Sep 24, 2025 •

edited

Loading

daniil-lyakhov Sep 24, 2025 •

edited

Loading