Generic Citrus Backend in tree_haver - Extended Feasibility Analysis
Executive Summary
After deep investigation, my recommendation has CHANGED:
YES, tree_haver SHOULD include a generic Citrus backend, but with careful scoping.
Key Discovery: The events Array
Citrus::Match objects have an events array where events[0] is the rule name (as a Symbol).
match.events.first # => :table, :keyvalue, :string, etc.
This provides a grammar-agnostic type system that we were missing!
What tree_haver CAN Provide Generically
Layer 1: Generic Citrus Mechanics (tree_haver)
module TreeHaver::Backends::Citrus
class Node
def type
@match.events.first # Rule name as Symbol
end
def start_byte
@match.offset
end
def end_byte
@match.offset + @match.length
end
def start_point
# Calculate row/col from offset
calculate_position(@match.offset)
end
def end_point
calculate_position(@match.offset + @match.length)
end
def text
@match.string
end
def children
@match.matches.map { |m| Node.new(m) }
end
def captures
@match.captures
end
end
end
Layer 2: Language Semantics (toml-merge, json-merge, etc.)
Each *-merge gem adds semantic understanding:
# In toml-merge
class Toml::Merge::NodeWrapper
def initialize(node, backend:)
case backend
when :tree_sitter
@adapter = TreeSitterAdapter.new(node)
when :citrus
@adapter = CitrusAdapter.new(node) # Uses tree_haver's generic backend
end
end
def table?
case @backend
when :tree_sitter
@adapter.type == :table
when :citrus
@adapter.type == :table # Same rule name!
end
end
end
Comparison: Before vs After Discovery
BEFORE (No type info):
❌ tree_haver: Can't provide types generically
✓ toml-merge: Must handle ALL Citrus mechanics + TOML semantics
- Position tracking
- Line calculation
- Type inference from value objects
- TOML-specific logic
AFTER (With events[0]):
✓ tree_haver: Provides generic Citrus Node with types
- Rule names as types (events[0])
- Position tracking
- Line/column calculation
- Child traversal
✓ toml-merge: Focuses only on TOML semantics
- Maps rule names to TOML concepts
- table rule → Table semantics
- keyvalue rule → Pair semantics
Architecture Recommendation
Option A: Citrus Backend in tree_haver ⭐ RECOMMENDED
tree_haver/
backends/
citrus/
node.rb # Generic Citrus::Match wrapper
parser.rb # Grammar loading and parsing
language.rb # Grammar registration
point.rb # Position (row/col) calculation
toml-merge/
backends/
citrus.rb # Thin adapter using tree_haver
# Maps tree_haver Citrus nodes to TOML semantics
# Minimal code - just semantic mappings
Benefits:
- ✅ tree_haver handles all Citrus mechanics
- ✅ toml-merge focuses on TOML semantics only
- ✅ Other *-merge gems can reuse tree_haver’s Citrus backend
- ✅ Consistent API across all backends (MRI, Rust, FFI, Citrus)
- ✅ Better separation of concerns
Tradeoffs:
- More complexity in tree_haver (new backend to maintain)
- But: If toml-merge proves Citrus is valuable, worth it
- But: Provides foundation for future Citrus-based grammars
Option B: Citrus Backend Only in toml-merge
toml-merge/
backends/
citrus/
match_wrapper.rb # Wraps Citrus::Match
parser.rb # Parsing logic
node_adapter.rb # Full AST interface + TOML semantics
Benefits:
- ✅ Keeps tree_haver simpler
- ✅ Faster to implement initially
- ✅ Can validate pattern before extraction
Tradeoffs:
- More code in toml-merge
- Duplication if other *-merge gems want Citrus
- Harder to extract later
Recommended Approach: Staged Implementation
Phase 1: Validate in toml-merge (FIRST)
- Build complete Citrus backend in toml-merge
- Get it working and tested
- Validate the approach with real usage
- Document what’s generic vs TOML-specific
Phase 2: Extract to tree_haver (AFTER validation)
- Identify truly generic Citrus handling
- Move generic parts to tree_haver/backends/citrus
- Keep TOML-specific parts in toml-merge
- Update toml-merge to use tree_haver’s Citrus backend
Phase 3: Polish and document (FINAL)
- Add examples and documentation
- Make it easy for other *-merge gems to adopt
- Document grammar requirements
What Makes This Feasible Now
The events[0] Discovery:
- ✅ Provides rule names as types
- ✅ Works for ANY Citrus grammar
- ✅ Allows generic Node implementation
- ✅ Maps cleanly to tree-sitter’s type concept
Position Information:
- ✅ offset and length from Citrus::Match
- ✅ Can calculate line/column from offsets
- ✅ Can provide both byte and point positions
Hierarchical Structure:
- ✅ matches array provides children
- ✅ captures provides named sub-matches
- ✅ Can traverse generically
API Compatibility Analysis
tree-sitter Node API:
node.type # String/Symbol
node.start_byte # Integer
node.end_byte # Integer
node.start_point # {row, column}
node.end_point # {row, column}
node.text # String
node.children # Array<Node>
node.named_children # Array<Node>
node.field(name) # Node
Citrus Generic Node (Proposed):
node.type # Symbol (from events[0]) ✅
node.start_byte # Integer (from offset) ✅
node.end_byte # Integer (offset + length) ✅
node.start_point # Calculate from offset ✅
node.end_point # Calculate from offset+len ✅
node.text # String (from match.string) ✅
node.children # Array<Node> (from matches) ✅
node.named_children # Filter children ✅ (can implement)
node.field(name) # From captures[name] ✅
Compatibility: 95% - Close enough for tree_haver abstraction!
Grammar Requirements
For a Citrus grammar to work with tree_haver’s generic backend:
- Must use Citrus PEG syntax - .citrus files
- Rule names become types - Use meaningful names
- Optional semantic layer - Can add custom classes like toml-rb does
- No special requirements - Events array is automatic
Example: Hypothetical JSON grammar
grammar JSON::Document
rule object
('{' (pair (',' pair)*)? '}') <JSON::ObjectParser>
end
rule array
('[' (value (',' value)*)? ']') <JSON::ArrayParser>
end
rule pair
(string ':' value) <JSON::PairParser>
end
end
With tree_haver’s Citrus backend:
node.type # => :object, :array, :pair
# Language-specific semantics in json-merge, not tree_haver
Other Citrus Grammars in the Wild
Research shows Citrus is underutilized:
- ✅ toml-rb - Active, maintained, production-ready
- ❓ Others - Few public Citrus grammar projects found
Implication:
- This would be pioneering work
- Could encourage more Citrus adoption
- Provides foundation for future grammars
- But: Limited existing ecosystem to leverage
Risk Assessment
Risk 1: Citrus Maintenance
- Risk: Citrus gem might not be actively maintained
- Check: Last update, community activity
- Mitigation: It’s pure Ruby, stable, can fork if needed
Risk 2: Performance
- Risk: Citrus + Ruby slower than tree-sitter + C
- Impact: Medium - acceptable for fallback backend
- Mitigation: Keep tree-sitter as default, Citrus as fallback
Risk 3: Grammar Compatibility
- Risk: Not all Citrus grammars may work well
- Impact: Low - can document requirements
- Mitigation: Test with toml-rb first, learn patterns
Risk 4: Maintenance Burden
- Risk: Another backend to maintain in tree_haver
- Impact: Medium - more test surface
- Mitigation: Staged approach (validate in toml-merge first)
Value Proposition
For tree_haver:
- ✅ Provides pure-Ruby fallback for ANY grammar
- ✅ Completes the backend story (native + FFI + pure Ruby)
- ✅ Pioneering work in Ruby parsing abstraction
- ✅ Foundation for future Citrus grammar ecosystem
For toml-merge:
- ✅ Simpler codebase (delegates to tree_haver)
- ✅ Focuses on TOML semantics only
- ✅ Consistent with tree_haver’s design
- ✅ Easy to maintain
For Other *-merge gems:
- ✅ Can create Citrus grammars for their formats
- ✅ Reuse tree_haver’s Citrus backend
- ✅ Pure Ruby option without native dependencies
- ✅ Lower barrier to entry
Final Recommendation
Immediate (Now):
Build Citrus backend in toml-merge first
- Validate the approach
- Work out edge cases
- Understand generic vs specific boundary
- Get it working in production
Near-term (After validation):
Extract to tree_haver if successful
- Move generic Citrus handling to tree_haver
- Document the pattern
- Make it easy for others to adopt
- Publish findings
Long-term (Future):
Encourage Citrus grammar ecosystem
- Document how to create grammars for tree_haver
- Support other *-merge gems adopting Citrus
- Build examples (JSON, YAML, etc.)
Conclusion
YES, a generic Citrus backend in tree_haver makes sense, BUT:
- Validate in toml-merge first - Don’t prematurely extract
- Extract after proving - Once pattern is solid, move to tree_haver
- Document well - Make it easy for others to follow
The discovery of events[0] containing rule names changes the feasibility from “limited value” to “solid foundation”. This enables tree_haver to provide a true generic Citrus backend that parallels its tree-sitter backends.
The phased approach reduces risk while enabling the vision.