Skip to content

keboola/osiris_pipeline

Repository files navigation

Osiris Pipeline v0.1.0 - Conversational ETL Pipeline Generator

MVP: Basic conversational ETL pipeline generation using AI. Simple proof-of-concept implementation.

🚀 Quick Start

# Setup
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Initialize configuration
python osiris.py init

# Start conversation
python osiris.py chat

Example Conversation

$ python osiris.py chat

You: "Show me top 10 customers by revenue"

Bot: I'll help analyze your top customers! Let me discover your database...
     Found tables: customers, orders. I'll create a pipeline that joins these 
     and calculates total revenue per customer.

     Here's the generated pipeline:
     [Shows YAML pipeline]
     
     Does this look correct?

You: "Perfect, run it!"

Bot: ✓ Pipeline executed! Found 10 customers, saved to output/results.csv

🎯 Pro Mode - Custom LLM Prompts

Osiris includes a powerful pro mode that allows advanced users to customize the AI system prompts:

# Export system prompts for customization
python osiris.py dump-prompts --export

# Edit prompts in .osiris_prompts/ directory:
# - conversation_system.txt    # Main AI personality & behavior  
# - sql_generation_system.txt  # SQL generation instructions
# - user_prompt_template.txt   # User context building template

# Use your custom prompts
python osiris.py chat --pro-mode

Use Cases:

  • 🏥 Domain-specific: Adapt for healthcare, finance, retail terminology
  • 🎨 Response style: Make AI more technical, concise, or detailed
  • 🌍 Multi-language: Adapt prompts for different languages
  • Performance: Fine-tune for better response quality

MVP Features

  • 🤖 AI Chat Interface: Conversational pipeline creation with natural language
  • 🎯 Custom LLM Prompts: Pro mode allows customizing AI system prompts for domain-specific use
  • 🔧 Multi-Database Support: MySQL, Supabase (PostgreSQL), and CSV file processing
  • 📋 YAML Pipeline Generation: Structured, reusable pipeline format
  • ✅ Human-in-the-Loop: Manual validation and approval before execution
  • 🎨 Rich Terminal UI: Beautiful formatted output with colors, tables, and progress indicators

Note: This is an early prototype. Many features are experimental.

Supported Sources

  • MySQL/MariaDB: Full extraction and loading support
  • Supabase: Cloud PostgreSQL with real-time capabilities
  • CSV Files: Local file processing

Documentation

Core Documentation

Examples & Usage

Development Archive

License

Apache-2.0

About

No description, website, or topics provided.

Resources

License

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •