Skip to content

Conversation

anzr299
Copy link
Collaborator

@anzr299 anzr299 commented Sep 22, 2025

Changes

Introduced a new API to offer weights compression algorithm for quantizers defined in torch.ao.
Currently only supports OpenVINO Quantizer.

Reason for changes

To support Quantizers defined in torch ao.

Related tickets

169342

@anzr299 anzr299 requested a review from a team as a code owner September 22, 2025 14:43
@github-actions github-actions bot added the API Public API-impacting changes label Sep 22, 2025
@anzr299 anzr299 marked this pull request as draft September 22, 2025 14:56
@daniil-lyakhov daniil-lyakhov self-requested a review September 22, 2025 15:03
Comment on lines 163 to 164
model: torch.fx.GraphModule,
quantizer: Quantizer,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please apply linters

Comment on lines 12 to 14

from abc import ABC
from abc import abstractmethod
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please import observers from the Executorch

Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I see the PR with OpenVINOQuantizer?

from nncf.quantization.algorithms.weight_compression.algorithm import WeightCompression


class WeightsCompressionPT2E(Algorithm):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This algorithm is not designed for the PT2E, this is experimental WC algorithm which could be implemented in any backend

Suggested change
class WeightsCompressionPT2E(Algorithm):
class WeightCompression(Algorithm):

Copy link
Collaborator Author

@anzr299 anzr299 Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I rename it to ExperimentalWeightCompression instead? since it could be confused with the original

Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is inside the experimental directory, that should be descriptive enough. I suggest the WeightCompression name


import torch

import nncf # type: ignore[import-untyped]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why # type: ignore[import-untyped] here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to update the typehint ignores since I copied them off my scripts

Comment on lines +34 to +35
) -> torch.fx.GraphModule:
self._quantizer = quantizer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typehints an docstring are missing

model,
parameters={
"mode": self._mode.value,
"mode": self._mode.value if not isinstance(self._mode, str) else self._mode,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the str mode here? Can we force self._mode to always be an ENUM param?

Comment on lines 77 to 93
if self._sensitivity_metric == nncf.SensitivityMetric.WEIGHT_QUANTIZATION_ERROR:
# Default case. It means that it is not defined by the user in the API
# Hence, the annotation(Quantization parameters for all layers) from the quantizer will be used.
all_weight_params = self._quantizer.get_weight_compression_setup(
model, graph
) # Get weight compression params FROM QUANTIZER
statistics, statistic_points = self._algo.collect_weight_compression_statistics(
model, graph, dataset, all_weight_params, statistic_points
)
else:
# Data Aware mixed precision is used. In this case, only nodes_to_compress is obtained from the quantizer
nodes_to_compress = self._quantizer.get_nodes_to_compress(
model, graph
) # Get nodes to compress FROM QUANTIZER
all_weight_params, statistics = self._algo.get_weight_compression_parameters(
model, graph, nodes_to_compress, statistic_points, dataset
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really puzzling, please share the OpenVINOQuantizer with me

Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just pass the dataset param to the quantizer.get_nncf_weight_compression_parameters and simplify the pipeline? With that, we don't need get_nodes_to_compress and collect_weight_compression_statistics methods in WC algorithm

def get_quantization_setup(self, model: torch.fx.GraphModule, nncf_graph: NNCFGraph) -> SingleConfigQuantizerSetup:
return self._quantizer.get_nncf_quantization_setup(model, nncf_graph)

def get_weight_compression_setup(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not use the word setup in context of WC

Suggested change
def get_weight_compression_setup(
def get_weight_compression_params(

Comment on lines 157 to 161
def get_nncf_weight_compression_setup(
self, model: torch.fx.GraphModule, nncf_graph: NNCFGraph
) -> quantization.quantizer_setup.SingleConfigQuantizerSetup:
nodes_to_compress = self.get_nodes_to_compress(model, nncf_graph)
return self._algo.get_weight_compression_parameters(model, nncf_graph, nodes_to_compress)[0]
Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_nncf_weight_compression_setup(
self, model: torch.fx.GraphModule, nncf_graph: NNCFGraph
) -> quantization.quantizer_setup.SingleConfigQuantizerSetup:
nodes_to_compress = self.get_nodes_to_compress(model, nncf_graph)
return self._algo.get_weight_compression_parameters(model, nncf_graph, nodes_to_compress)[0]
def get_nncf_weight_compression_parameters(
self, model: torch.fx.GraphModule, nncf_graph: NNCFGraph, dataset: Dataset
) -> tuple[list[WeightCompressionParameters], Optional[dict[str, WCTensorStatistic]]]:
return self._algo.get_weight_compression_parameters(model, nncf_graph, dataset=dataset)

Comment on lines +76 to +92
if self._sensitivity_metric is None:
# Default case. It means that it is not defined by the user in the API
# Hence, the annotation(Quantization parameters for all layers) from the quantizer will be used.
all_weight_params = self._quantizer.get_weight_compression_setup(
model, graph
) # Get weight compression params FROM QUANTIZER
statistics, statistic_points = self._algo.collect_weight_compression_statistics(
model, graph, dataset, all_weight_params, statistic_points
)
else:
# Data Aware mixed precision is used. In this case, only nodes_to_compress is obtained from the quantizer
nodes_to_compress = self._quantizer.get_nodes_to_compress(
model, graph
) # Get nodes to compress FROM QUANTIZER
all_weight_params, statistics = self._algo.get_weight_compression_parameters(
model, graph, nodes_to_compress, statistic_points, dataset
)
Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self._sensitivity_metric is None:
# Default case. It means that it is not defined by the user in the API
# Hence, the annotation(Quantization parameters for all layers) from the quantizer will be used.
all_weight_params = self._quantizer.get_weight_compression_setup(
model, graph
) # Get weight compression params FROM QUANTIZER
statistics, statistic_points = self._algo.collect_weight_compression_statistics(
model, graph, dataset, all_weight_params, statistic_points
)
else:
# Data Aware mixed precision is used. In this case, only nodes_to_compress is obtained from the quantizer
nodes_to_compress = self._quantizer.get_nodes_to_compress(
model, graph
) # Get nodes to compress FROM QUANTIZER
all_weight_params, statistics = self._algo.get_weight_compression_parameters(
model, graph, nodes_to_compress, statistic_points, dataset
)
all_weight_params, statistics = self._quantizer.get_nncf_weight_compression_parameters(..., dataset=dataset)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we might not be able to achieve our 2 goals:

  1. data aware mixed precision if sensitivity is defined by the user.
  2. data free mixed precision if sensitivity is not.
    In the case of a external quantizer, we will not be able to call the mixed precision algorithm inside the quantizer adapter to return weight compression params for mixed precision.

all_weight_params, statistics = self.get_weight_compression_parameters(
model, graph, nodes_to_compress, statistic_points, dataset
)
transformed_model = self.apply_wc_algos(model, graph, all_weight_params, statistics, dataset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please come up with a docstring as well

Suggested change
transformed_model = self.apply_wc_algos(model, graph, all_weight_params, statistics, dataset)
transformed_model = self.apply_with_compression_parameters(model, graph, all_weight_params, statistics, dataset)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Public API-impacting changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants