Skip to content

Conversation

danielaskdd
Copy link
Collaborator

Fix ENTITY_TYPES Environment Variable Handling

🐛 Problem Description

The ENTITY_TYPES environment variable was incorrectly processed when provided as a JSON array string, causing entity extraction to fail completely. Instead of parsing the JSON string into a proper Python list, the system was converting it character-by-character, resulting in unusable single-character tokens being passed to the LLM.

Before Fix:

ENTITY_TYPES='["Product","Manufacturer","Brand"]'
# Result: ['[', '"', 'P', 'r', 'o', 'd', 'u', 'c', 't', '"', ',', ...]

Impact:

  • Entity extraction completely broken
  • LLM receives meaningless character arrays instead of entity type names
  • Custom entity types cannot be used effectively

🔧 Solution

Modified the get_env_value function in lightrag/utils.py to properly handle JSON parsing for list-type environment variables.

Key Changes:

  • Added JSON parsing logic specifically for list type environment variables
  • Implemented type validation to ensure parsed values are actually lists
  • Added comprehensive error handling with graceful fallback to defaults
  • Maintained full backward compatibility for all other environment variable types
  • Added informative logging for debugging invalid configurations

✅ After Fix

Now Works Correctly:

ENTITY_TYPES='["Product","Manufacturer","Brand","Store","Category","Service","Technology","Feature","Location","Price","Model"]'
# Result: ["Product","Manufacturer","Brand","Store","Category","Service","Technology","Feature","Location","Price","Model"]

🧪 Testing

  • ✅ Valid JSON arrays are correctly parsed into Python lists
  • ✅ Invalid JSON gracefully falls back to default entity types
  • ✅ Non-list JSON objects fall back to defaults with appropriate warnings
  • ✅ Empty/missing environment variables use default values
  • ✅ Full backward compatibility maintained for string, int, and bool types
  • ✅ Entity extraction prompts now receive proper entity type lists
  • ✅ LLM can correctly classify entities using custom types

📋 Usage

Users can now set custom entity types using standard JSON array syntax:

export ENTITY_TYPES='["Product","Manufacturer","Brand","Store","Category","Service","Technology","Feature","Location","Price","Model"]'

The system will properly parse this into a Python list and use it in entity extraction prompts, allowing the LLM to correctly classify entities according to the specified custom types.

🔄 Backward Compatibility

This fix maintains 100% backward compatibility:

  • All existing environment variable handling continues to work unchanged
  • Default entity types are used when ENTITY_TYPES is not set
  • Other list-type environment variables (if any) will benefit from the same JSON parsing logic

- Improve reduntant quotes in entity and relation name, type and keywords
- Add HTML tag cleaning and Chinese symbol conversion
- Filter out short numeric content and malformed text
- Enhance entity type validation with character filtering
• Add field count validation warnings
• Fix relationship field count (5→6)
• Change error logs to warnings
-   **Prompts**: Restructured prompts with clearer steps and quality guidelines. Simplified the relationship tuple by removing `relationship_strength`
-   **Model**: Updated default entity types to be more comprehensive and consistently capitalized (e.g., `Location`, `Product`)
… logging

• Add queue_name parameter to decorator
• Update all log messages with queue names
• Pass specific names for LLM and embedding
- Add JSON parsing for list env vars
- Update entity types example format
- Add list type support to get_env_value
…Prompt

- Fix "feild" → "field" typo
- Clarify delimiter spacing rules
@danielaskdd danielaskdd merged commit cdc4570 into HKUDS:main Aug 31, 2025
1 check passed
@danielaskdd danielaskdd deleted the fix-entity-type-env branch August 31, 2025 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant