-
Notifications
You must be signed in to change notification settings - Fork 435
Added filename #440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added filename #440
Conversation
The latest updates on your projects. Learn more about Vercel for Git βοΈ
|
WalkthroughThe changes introduce a Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant EvaluationPipeline
participant SyntheticIntentGenerator
participant W&B
User->>CLI: Run evaluation/generation command with --dataset-filename
CLI->>EvaluationPipeline: run(dataset_artifact, dataset_filename, ...)
EvaluationPipeline->>SyntheticIntentGenerator: generate(dataset_artifact, dataset_filename, ...)
SyntheticIntentGenerator->>W&B: Save dataset as artifact using dataset_filename
EvaluationPipeline->>W&B: Load dataset using dataset_filename
EvaluationPipeline->>EvaluationPipeline: Evaluate dataset
EvaluationPipeline->>W&B: Log results with dataset_filename in config
Poem
π Recent review detailsConfiguration used: CodeRabbit UI π Files selected for processing (2)
π§ Files skipped from review as they are similar to previous changes (2)
β¨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. πͺ§ TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
π§Ή Nitpick comments (7)
backend/evals/synthetic_intent_generator.py (3)
112-112
: Update the docstring to document the new parameter.The method signature correctly adds the
dataset_filename
parameter, but the docstring should be updated to describe this parameter.def _save_to_wandb(self, df: pd.DataFrame, dataset_artifact: str, dataset_filename: str) -> str: """ Save the dataset as a wandb artifact. Args: df: DataFrame containing the generated dataset dataset_artifact: Name for the artifact + dataset_filename: Filename to save the dataset CSV to Returns: The artifact name for reference """
142-149
: Consider adding error handling for file operations.The direct file writing approach is cleaner than using temporary files, but consider adding error handling for potential issues like directory creation and file write permissions.
# Write dataframe to the temporary file + os.makedirs(os.path.dirname(dataset_filename), exist_ok=True) df.to_csv(dataset_filename, index=False) # Add the file to the artifact artifact.add_file(dataset_filename) # Log the artifact wandb.log_artifact(artifact)
Note: This would require adding
import os
back to the imports.
154-154
: Update the docstring to document the new parameter.The method signature correctly adds the
dataset_filename
parameter, but the docstring should be updated.def generate( self, dataset_artifact: str, dataset_filename: str, limit: int | None = None, ) -> pd.DataFrame: """ Generate synthetic intents and save them. Args: + dataset_artifact: Name for the wandb artifact + dataset_filename: Filename to save the dataset CSV to limit: Optional limit on number of samples to generate Returns: - The name of the saved artifact + DataFrame containing the generated dataset """backend/evals/evaluation_pipeline.py (4)
64-76
: Fix the docstring and document the new parameter.The implementation correctly uses the
dataset_filename
parameter, but the docstring has issues.def _load_dataset_from_wandb(self, artifact_name: str, dataset_filename: str) -> pd.DataFrame: """ Load a dataset from a W&B artifact. Args: artifact_name: Name of the W&B artifact - dataset_filename: Filename to save the dataset to + dataset_filename: Filename of the dataset CSV within the artifact Returns: DataFrame containing the dataset """
78-98
: Update the docstring to document the new parameter.The method signature and parameter propagation are correct, but the docstring should document the new parameter.
def _generate( self, dataset_artifact: str, dataset_filename: str, generation_limit: int | None = None, ) -> pd.DataFrame: """ Generate synthetic intents. Args: dataset_artifact: Name of the artifact to save the dataset to + dataset_filename: Filename to save the dataset CSV to generation_limit: Optional limit on number of samples to generate Returns: DataFrame containing the generated dataset """
104-124
: Update the docstring to document the new parameter.The implementation correctly uses the
dataset_filename
parameter, but the docstring should document it.def _evaluate( self, dataset_artifact: str, dataset_filename: str, evaluation_samples: int | None = None, df: pd.DataFrame | None = None, ) -> dict: """ Evaluate search performance on a dataset. Args: dataset_artifact: Name of the dataset artifact to evaluate + dataset_filename: Filename of the dataset CSV within the artifact evaluation_samples: Optional limit on number of samples to evaluate df: Optional DataFrame containing the dataset. If None, load from dataset_artifact Returns: Dictionary containing evaluation metrics """
π§° Tools
πͺ Pylint (3.3.7)
[warning] 123-123: Use lazy % formatting in logging functions
(W1203)
147-200
: Update the docstring to document the new parameter.The implementation correctly propagates the
dataset_filename
parameter and includes it in wandb config, but the docstring should document it.def run( self, dataset_artifact: str, dataset_filename: str, generate_data: bool = False, evaluate_data: bool = True, generation_limit: int | None = None, evaluation_samples: int | None = None, ) -> None: """ Run the evaluation pipeline. Args: + dataset_artifact: Name of dataset artifact to use + dataset_filename: Filename of the dataset CSV within the artifact generate_data: Whether to generate new data evaluate_data: Whether to evaluate data - dataset_artifact: Name of dataset artifact to use generation_limit: Optional limit on number of samples to generate evaluation_samples: Optional limit on number of samples to evaluate Returns: - Dictionary containing evaluation metrics if evaluation was performed, None otherwise + None """π§° Tools
πͺ Pylint (3.3.7)
[refactor] 147-147: Too many arguments (7/5)
(R0913)
[refactor] 147-147: Too many positional arguments (7/5)
(R0917)
π Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
π Files selected for processing (2)
backend/evals/evaluation_pipeline.py
(10 hunks)backend/evals/synthetic_intent_generator.py
(3 hunks)
π Additional comments (4)
backend/evals/synthetic_intent_generator.py (1)
183-183
: LGTM!The parameter propagation is correct and consistent with the updated method signature.
backend/evals/evaluation_pipeline.py (3)
15-15
: LGTM!Good addition of a sensible default filename constant.
214-225
: LGTM!The CLI option changes are well-designed with clear naming and appropriate defaults. The separation of
--dataset-artifact
and--dataset-filename
provides better clarity.
228-267
: LGTM!The main function signature and pipeline invocation correctly handle the new
dataset_filename
parameter. All parameters are properly passed through.π§° Tools
πͺ Pylint (3.3.7)
[convention] 244-244: Line too long (115/100)
(C0301)
π·οΈ Ticket
[link the issue or ticket you are addressing in this PR here, or use the Development
section on the right sidebar to link the issue]
π Description
Specify the filename to upload to wandb
π₯ Demo (if applicable)
πΈ Screenshots (if applicable)
β Checklist
Summary by CodeRabbit