Generic Citrus Backend in tree_haver - Extended Feasibility Analysis

Executive Summary

After deep investigation, my recommendation has CHANGED:

YES, tree_haver SHOULD include a generic Citrus backend, but with careful scoping.

Key Discovery: The events Array

Citrus::Match objects have an events array where events[0] is the rule name (as a Symbol).

match.events.first  # => :table, :keyvalue, :string, etc.

This provides a grammar-agnostic type system that we were missing!

What tree_haver CAN Provide Generically

Layer 1: Generic Citrus Mechanics (tree_haver)

module TreeHaver::Backends::Citrus
  class Node
    def type
      @match.events.first  # Rule name as Symbol
    end
    
    def start_byte
      @match.offset
    end
    
    def end_byte  
      @match.offset + @match.length
    end
    
    def start_point
      # Calculate row/col from offset
      calculate_position(@match.offset)
    end
    
    def end_point
      calculate_position(@match.offset + @match.length)
    end
    
    def text
      @match.string
    end
    
    def children
      @match.matches.map { |m| Node.new(m) }
    end
    
    def captures
      @match.captures
    end
  end
end

Layer 2: Language Semantics (toml-merge, json-merge, etc.)

Each *-merge gem adds semantic understanding:

# In toml-merge
class Toml::Merge::NodeWrapper
  def initialize(node, backend:)
    case backend
    when :tree_sitter
      @adapter = TreeSitterAdapter.new(node)
    when :citrus
      @adapter = CitrusAdapter.new(node)  # Uses tree_haver's generic backend
    end
  end
  
  def table?
    case @backend
    when :tree_sitter
      @adapter.type == :table
    when :citrus
      @adapter.type == :table  # Same rule name!
    end
  end
end

Comparison: Before vs After Discovery

BEFORE (No type info):

❌ tree_haver: Can't provide types generically
✓ toml-merge: Must handle ALL Citrus mechanics + TOML semantics
   - Position tracking
   - Line calculation  
   - Type inference from value objects
   - TOML-specific logic

AFTER (With events[0]):

✓ tree_haver: Provides generic Citrus Node with types
   - Rule names as types (events[0])
   - Position tracking
   - Line/column calculation
   - Child traversal
   
✓ toml-merge: Focuses only on TOML semantics
   - Maps rule names to TOML concepts
   - table rule → Table semantics
   - keyvalue rule → Pair semantics

Architecture Recommendation

tree_haver/
  backends/
    citrus/
      node.rb           # Generic Citrus::Match wrapper
      parser.rb         # Grammar loading and parsing
      language.rb       # Grammar registration
      point.rb          # Position (row/col) calculation
      
toml-merge/
  backends/
    citrus.rb           # Thin adapter using tree_haver
      # Maps tree_haver Citrus nodes to TOML semantics
      # Minimal code - just semantic mappings

Benefits:

  • ✅ tree_haver handles all Citrus mechanics
  • ✅ toml-merge focuses on TOML semantics only
  • ✅ Other *-merge gems can reuse tree_haver’s Citrus backend
  • ✅ Consistent API across all backends (MRI, Rust, FFI, Citrus)
  • ✅ Better separation of concerns

Tradeoffs:

  • More complexity in tree_haver (new backend to maintain)
  • But: If toml-merge proves Citrus is valuable, worth it
  • But: Provides foundation for future Citrus-based grammars

Option B: Citrus Backend Only in toml-merge

toml-merge/
  backends/
    citrus/
      match_wrapper.rb  # Wraps Citrus::Match
      parser.rb         # Parsing logic
      node_adapter.rb   # Full AST interface + TOML semantics

Benefits:

  • ✅ Keeps tree_haver simpler
  • ✅ Faster to implement initially
  • ✅ Can validate pattern before extraction

Tradeoffs:

  • More code in toml-merge
  • Duplication if other *-merge gems want Citrus
  • Harder to extract later

Phase 1: Validate in toml-merge (FIRST)

  1. Build complete Citrus backend in toml-merge
  2. Get it working and tested
  3. Validate the approach with real usage
  4. Document what’s generic vs TOML-specific

Phase 2: Extract to tree_haver (AFTER validation)

  1. Identify truly generic Citrus handling
  2. Move generic parts to tree_haver/backends/citrus
  3. Keep TOML-specific parts in toml-merge
  4. Update toml-merge to use tree_haver’s Citrus backend

Phase 3: Polish and document (FINAL)

  1. Add examples and documentation
  2. Make it easy for other *-merge gems to adopt
  3. Document grammar requirements

What Makes This Feasible Now

The events[0] Discovery:

  • ✅ Provides rule names as types
  • ✅ Works for ANY Citrus grammar
  • ✅ Allows generic Node implementation
  • ✅ Maps cleanly to tree-sitter’s type concept

Position Information:

  • ✅ offset and length from Citrus::Match
  • ✅ Can calculate line/column from offsets
  • ✅ Can provide both byte and point positions

Hierarchical Structure:

  • ✅ matches array provides children
  • ✅ captures provides named sub-matches
  • ✅ Can traverse generically

API Compatibility Analysis

tree-sitter Node API:

node.type          # String/Symbol
node.start_byte    # Integer
node.end_byte      # Integer  
node.start_point   # {row, column}
node.end_point     # {row, column}
node.text          # String
node.children      # Array<Node>
node.named_children # Array<Node>
node.field(name)   # Node

Citrus Generic Node (Proposed):

node.type          # Symbol (from events[0])     ✅
node.start_byte    # Integer (from offset)       ✅
node.end_byte      # Integer (offset + length)   ✅
node.start_point   # Calculate from offset       ✅
node.end_point     # Calculate from offset+len   ✅
node.text          # String (from match.string)  ✅
node.children      # Array<Node> (from matches)  ✅
node.named_children # Filter children            ✅ (can implement)
node.field(name)   # From captures[name]         ✅

Compatibility: 95% - Close enough for tree_haver abstraction!

Grammar Requirements

For a Citrus grammar to work with tree_haver’s generic backend:

  1. Must use Citrus PEG syntax - .citrus files
  2. Rule names become types - Use meaningful names
  3. Optional semantic layer - Can add custom classes like toml-rb does
  4. No special requirements - Events array is automatic

Example: Hypothetical JSON grammar

grammar JSON::Document
  rule object
    ('{' (pair (',' pair)*)? '}') <JSON::ObjectParser>
  end
  
  rule array
    ('[' (value (',' value)*)? ']') <JSON::ArrayParser>
  end
  
  rule pair
    (string ':' value) <JSON::PairParser>
  end
end

With tree_haver’s Citrus backend:

node.type  # => :object, :array, :pair
# Language-specific semantics in json-merge, not tree_haver

Other Citrus Grammars in the Wild

Research shows Citrus is underutilized:

  • toml-rb - Active, maintained, production-ready
  • ❓ Others - Few public Citrus grammar projects found

Implication:

  • This would be pioneering work
  • Could encourage more Citrus adoption
  • Provides foundation for future grammars
  • But: Limited existing ecosystem to leverage

Risk Assessment

Risk 1: Citrus Maintenance

  • Risk: Citrus gem might not be actively maintained
  • Check: Last update, community activity
  • Mitigation: It’s pure Ruby, stable, can fork if needed

Risk 2: Performance

  • Risk: Citrus + Ruby slower than tree-sitter + C
  • Impact: Medium - acceptable for fallback backend
  • Mitigation: Keep tree-sitter as default, Citrus as fallback

Risk 3: Grammar Compatibility

  • Risk: Not all Citrus grammars may work well
  • Impact: Low - can document requirements
  • Mitigation: Test with toml-rb first, learn patterns

Risk 4: Maintenance Burden

  • Risk: Another backend to maintain in tree_haver
  • Impact: Medium - more test surface
  • Mitigation: Staged approach (validate in toml-merge first)

Value Proposition

For tree_haver:

  • ✅ Provides pure-Ruby fallback for ANY grammar
  • ✅ Completes the backend story (native + FFI + pure Ruby)
  • ✅ Pioneering work in Ruby parsing abstraction
  • ✅ Foundation for future Citrus grammar ecosystem

For toml-merge:

  • ✅ Simpler codebase (delegates to tree_haver)
  • ✅ Focuses on TOML semantics only
  • ✅ Consistent with tree_haver’s design
  • ✅ Easy to maintain

For Other *-merge gems:

  • ✅ Can create Citrus grammars for their formats
  • ✅ Reuse tree_haver’s Citrus backend
  • ✅ Pure Ruby option without native dependencies
  • ✅ Lower barrier to entry

Final Recommendation

Immediate (Now):

Build Citrus backend in toml-merge first

  • Validate the approach
  • Work out edge cases
  • Understand generic vs specific boundary
  • Get it working in production

Near-term (After validation):

Extract to tree_haver if successful

  • Move generic Citrus handling to tree_haver
  • Document the pattern
  • Make it easy for others to adopt
  • Publish findings

Long-term (Future):

Encourage Citrus grammar ecosystem

  • Document how to create grammars for tree_haver
  • Support other *-merge gems adopting Citrus
  • Build examples (JSON, YAML, etc.)

Conclusion

YES, a generic Citrus backend in tree_haver makes sense, BUT:

  1. Validate in toml-merge first - Don’t prematurely extract
  2. Extract after proving - Once pattern is solid, move to tree_haver
  3. Document well - Make it easy for others to follow

The discovery of events[0] containing rule names changes the feasibility from “limited value” to “solid foundation”. This enables tree_haver to provide a true generic Citrus backend that parallels its tree-sitter backends.

The phased approach reduces risk while enabling the vision.