-
Notifications
You must be signed in to change notification settings - Fork 3
Show sample for each Level type on ROBOT report #134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
6342501
32c9dbb
d48181b
634ede8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,20 +2,23 @@ | |
|
||
import argparse | ||
import json | ||
import logging | ||
import os | ||
import re | ||
import sys | ||
|
||
import pandas as pd | ||
from jinja2 import Template | ||
|
||
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') | ||
|
||
|
||
def main(args): | ||
""" | ||
""" | ||
parser = argparse.ArgumentParser(description='Create a report HTML page') | ||
parser.add_argument('report', | ||
type=argparse.FileType('r'), | ||
type=argparse.FileType('r+'), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider using a separate output file instead of overwriting the input. Opening the report file in 'r+' mode and overwriting it at line 82 is risky. If an error occurs during processing, the original data could be lost. Consider adding a separate output parameter for the filtered report: parser.add_argument('report',
- type=argparse.FileType('r+'),
+ type=argparse.FileType('r'),
help='TSV report to convert to HTML')
+parser.add_argument('--output-report',
+ type=argparse.FileType('w'),
+ help='Output filtered report TSV file (optional)') Then modify line 82: -if len(report_filtered) > args.limitlines:
- report_filtered.to_csv(args.report, sep="\t", index=False)
+if args.output_report and len(report) > args.limitlines:
+ report_filtered.to_csv(args.output_report, sep="\t", index=False) Also applies to: 82-82 🤖 Prompt for AI Agents
|
||
help='TSV report to convert to HTML') | ||
parser.add_argument('context', | ||
type=argparse.FileType('r'), | ||
|
@@ -38,20 +41,58 @@ def main(args): | |
|
||
error_count_rule = {} | ||
error_count_level = {} | ||
report_filtered = pd.DataFrame() | ||
|
||
try: | ||
report = pd.read_csv(args.report, sep="\t") | ||
|
||
# Get sample of each level only for ROBOT report | ||
if "Level" in report.columns and "Rule Name" in report.columns: | ||
error_count_level = report["Level"].value_counts() | ||
error_count_rule = report["Rule Name"].value_counts() | ||
except Exception: | ||
print("No report") | ||
|
||
error_count_error = error_count_level.get("ERROR", 0) | ||
if error_count_error < args.limitlines: | ||
rest = args.limitlines - error_count_level["ERROR"] | ||
|
||
# Calculate the sample number for each level based on group size | ||
def calculate_sample_size(group, rest): | ||
if group["Level"].iloc[0] == "ERROR": | ||
return group.shape[0] | ||
|
||
return min(group.shape[0], rest) | ||
|
||
Comment on lines
+56
to
+64
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix the sampling logic to properly distribute remaining lines. The current logic incorrectly uses the full The sampling should distribute the remaining lines proportionally among non-ERROR levels: -error_count_error = error_count_level.get("ERROR", 0)
-if error_count_error < args.limitlines:
- rest = args.limitlines - error_count_level["ERROR"]
-
- # Calculate the sample number for each level based on group size
- def calculate_sample_size(group, rest):
- if group["Level"].iloc[0] == "ERROR":
- return group.shape[0]
-
- return min(group.shape[0], rest)
+error_count_error = error_count_level.get("ERROR", 0)
+if error_count_error < args.limitlines:
+ rest = args.limitlines - error_count_error
+
+ # Calculate proportional sample sizes for non-ERROR levels
+ non_error_counts = {level: count for level, count in error_count_level.items() if level != "ERROR"}
+ total_non_error = sum(non_error_counts.values())
+
+ level_sample_sizes = {"ERROR": error_count_error}
+ if total_non_error > 0:
+ for level, count in non_error_counts.items():
+ # Proportionally distribute remaining lines
+ level_sample_sizes[level] = int(rest * count / total_non_error)
+
+ def calculate_sample_size(group):
+ level = group["Level"].iloc[0]
+ group_key = (level, group["Rule Name"].iloc[0], group["Subject"].iloc[0])
+ # Distribute level sample size among groups of that level
+ level_sample = level_sample_sizes.get(level, 0)
+ # This is a simplified approach - you may want to further distribute among groups
+ return min(group.shape[0], max(1, level_sample)) Also, line 56 still uses direct dictionary access instead of -rest = args.limitlines - error_count_level["ERROR"]
+rest = args.limitlines - error_count_error
🤖 Prompt for AI Agents
|
||
required_columns = ["Level", "Rule Name", "Subject", "Property", "Value"] | ||
missing_columns = [col for col in required_columns if col not in report.columns] | ||
if missing_columns: | ||
raise KeyError(f"Missing columns in report: {missing_columns}") | ||
|
||
# Get a sample of each Level type | ||
report_filtered = report.groupby(by=["Level","Rule Name","Subject"])[ | ||
["Level", "Rule Name", "Subject", "Property", "Value"] | ||
].apply( | ||
lambda x: x.sample(calculate_sample_size(x, rest)) | ||
).reset_index(drop=True) | ||
Comment on lines
+71
to
+75
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Make the sampling more robust against edge cases. The current implementation could fail if groups are empty or if sampling parameters are invalid. Consider adding error handling: # Get a sample of each Level type
-report_filtered = report.groupby(by=["Level","Rule Name","Subject"])[
- ["Level", "Rule Name", "Subject", "Property", "Value"]
-].apply(
- lambda x: x.sample(calculate_sample_size(x, rest))
-).reset_index(drop=True)
+def safe_sample(group, sample_size_func):
+ try:
+ n = sample_size_func(group)
+ if n > 0 and len(group) > 0:
+ return group.sample(min(n, len(group)))
+ return group
+ except Exception as e:
+ logging.warning(f"Sampling failed for group: {e}")
+ return group
+
+report_filtered = report.groupby(by=["Level","Rule Name","Subject"])[
+ ["Level", "Rule Name", "Subject", "Property", "Value"]
+].apply(
+ lambda x: safe_sample(x, calculate_sample_size)
+).reset_index(drop=True)
🤖 Prompt for AI Agents
|
||
else: | ||
report_filtered = report.head(args.limitlines) | ||
else: | ||
report_filtered = report.head(args.limitlines) | ||
|
||
if len(report_filtered) > args.limitlines: | ||
report_filtered.to_csv(args.report, sep="\t", index=False) | ||
anitacaron marked this conversation as resolved.
Show resolved
Hide resolved
Comment on lines
+81
to
+82
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove or fix the illogical condition. The condition Either remove this block or change the logic to save the filtered report when the original report exceeds the limit: -if len(report_filtered) > args.limitlines:
- report_filtered.to_csv(args.report, sep="\t", index=False)
+# Save filtered report if original exceeded limit
+if len(report) > args.limitlines and args.output_report:
+ report_filtered.to_csv(args.output_report, sep="\t", index=False)
🤖 Prompt for AI Agents
|
||
|
||
except pd.errors.EmptyDataError as e: | ||
logging.error("Empty data error: %s", e) | ||
except FileNotFoundError as e: | ||
logging.error("File not found: %s", e) | ||
except Exception as e: | ||
logging.error("An unexpected error occurred: %s", e) | ||
|
||
# Load Jinja2 template | ||
template = Template(args.template.read()) | ||
|
||
# Generate the HTML output | ||
res = template.render(contents=report.head(args.limitlines), | ||
res = template.render(contents=report_filtered.reset_index(drop=True), | ||
maybe_get_link=maybe_get_link, | ||
context=context, | ||
title=args.title, | ||
|
Uh oh!
There was an error while loading. Please reload this page.