Architecture Decision: Where Should Citrus Backend Live?

TL;DR Recommendation

Start in toml-merge, extract to tree_haver after validation

Key Discovery

Citrus::Match objects have an events array where events[0] is the rule name (Symbol).
This provides a grammar-agnostic type system!

match.events.first  # => :table, :keyvalue, :string, etc.

This changes everything - we CAN build a generic Citrus backend.

Three Options Compared

Option 1: Citrus Backend Only in toml-merge

toml-merge/
  lib/toml/merge/
    backends/
      tree_sitter.rb          # Existing
      citrus.rb               # New - full implementation
        ├── match_wrapper.rb  # Wraps Citrus::Match
        ├── parser.rb         # Parsing
        └── node_adapter.rb   # Full Node interface

Pros:

  • ✅ Fastest to implement
  • ✅ Keeps tree_haver simple
  • ✅ Can iterate quickly
  • ✅ No cross-gem coordination

Cons:

  • ❌ All Citrus logic in toml-merge
  • ❌ Other *-merge gems must duplicate
  • ❌ Harder to extract later
  • ❌ Mixing generic + TOML-specific

Option 2: Citrus Backend Only in tree_haver

tree_haver/
  lib/tree_haver/backends/
    citrus/
      node.rb               # Generic Citrus::Match wrapper
      parser.rb             # Parsing
      language.rb           # Grammar loading
      point.rb              # Position calculation

toml-merge/
  lib/toml/merge/
    backends/
      tree_sitter.rb        # Uses tree_haver
      citrus.rb             # Thin adapter - just TOML semantics

Pros:

  • ✅ Clean separation (generic vs semantic)
  • ✅ Other *-merge gems can reuse
  • ✅ Consistent with tree_haver design
  • ✅ Promotes Citrus ecosystem

Cons:

  • ❌ Unproven architecture
  • ❌ More upfront complexity
  • ❌ Harder to change if wrong
  • ❌ Cross-gem coordination needed

Phase 1: Build in toml-merge

toml-merge/backends/citrus/
  ├── Full implementation
  └── Clearly marked: generic vs TOML-specific

Phase 2: Extract to tree_haver (after validation)

tree_haver/backends/citrus/
  └── Generic parts moved here

toml-merge/backends/citrus/  
  └── Only TOML-specific parts remain

Pros:

  • ✅ ✅ ✅ Low risk - validate before extraction
  • ✅ Fast initial implementation
  • ✅ Learn the right boundaries
  • ✅ Can refine before making it generic
  • ✅ Benefits of both approaches

Cons:

  • More steps overall
  • Temporary duplication during transition
  • But: Both cons are temporary!

Decision Matrix

Criteria Only toml-merge Only tree_haver Staged
Time to first working Fast ✅ Slow ❌ Fast ✅
Risk of wrong abstraction Low ✅ High ❌ Low ✅
Reusability None ❌ High ✅ High ✅
Separation of concerns Poor ❌ Excellent ✅ Excellent ✅
Flexibility to iterate High ✅ Low ❌ High ✅
Long-term maintenance Higher ❌ Lower ✅ Lower ✅
Implementation effort Medium High Medium

Winner: Staged Approach - Best of both worlds


Implementation Plan: Staged Approach

Stage 1: Build in toml-merge (Weeks 1-2)

Goal: Get Citrus backend working, learn the patterns

# lib/toml/merge/backends/citrus.rb
module Toml::Merge::Backends
  module Citrus
    # Mark what's generic with comments
    class MatchWrapper  # GENERIC - could move to tree_haver
      def initialize(match)
        @match = match
      end
      
      def type
        @match.events.first  # Rule name
      end
      
      def start_byte
        @match.offset
      end
      
      # ... etc - all generic Citrus mechanics
    end
    
    class TomlNodeAdapter  # TOML-SPECIFIC - stays in toml-merge
      def initialize(wrapped_match)
        @wrapped = wrapped_match
      end
      
      def table?
        @wrapped.type == :table
      end
      
      # ... TOML semantics
    end
  end
end

Deliverables:

  • Working Citrus backend
  • Full test coverage
  • Documentation of generic vs specific
  • Performance benchmarks

Stage 2: Validate & Refine (Weeks 3-4)

Goal: Use in production, find edge cases

Tasks:

  • Deploy to production
  • Gather metrics
  • Fix bugs
  • Refine boundaries
  • Document extraction plan

Success Criteria:

  • All tests passing
  • Performance acceptable
  • Clear boundary identified
  • Ready to extract

Stage 3: Extract to tree_haver (Weeks 5-6)

Goal: Move generic parts to tree_haver

# tree_haver/lib/tree_haver/backends/citrus.rb
module TreeHaver::Backends
  module Citrus
    class Node  # Extracted from toml-merge
      # Generic Citrus::Match wrapper
    end
    
    class Parser
      # Generic grammar loading/parsing
    end
  end
end

# toml-merge/lib/toml/merge/backends/citrus.rb
module Toml::Merge::Backends
  module Citrus
    # Now just uses tree_haver + adds TOML semantics
    class Adapter
      def initialize(tree_haver_node)
        @node = tree_haver_node
      end
      
      def table?
        @node.type == :table
      end
      # ... TOML-specific only
    end
  end
end

Deliverables:

  • tree_haver gains Citrus backend
  • toml-merge simplified
  • All tests still passing
  • Documentation updated

Stage 4: Polish & Document (Week 7)

Goal: Make it easy for others to use

Tasks:

  • Write tree_haver Citrus guide
  • Document grammar requirements
  • Add examples
  • Update READMEs
  • Blog post/announcement

What Goes Where (After Extraction)

tree_haver (Generic Citrus Mechanics)

Purpose: Make ANY Citrus grammar work like tree-sitter

# Generic capabilities:
- Wrap Citrus::Match
- Extract type from events[0]
- Provide position info (bytes + points)
- Child traversal
- Capture access
- Text extraction

Example usage:

# Works with ANY Citrus grammar
language = TreeHaver::Language.from_citrus_grammar(
  path: "path/to/grammar.citrus",
  grammar_module: MyFormat::Document
)

parser = TreeHaver::Parser.new
parser.language = language
tree = parser.parse(source)

node = tree.root_node
node.type        # => :object (from grammar rule name)
node.start_byte  # => 0
node.children    # => [...]

toml-merge (TOML Semantics)

Purpose: Understand TOML-specific structure

# TOML-specific knowledge:
- table rule => Table semantics
- keyvalue rule => Pair semantics  
- array rule => Array semantics
- Comment handling
- Table header extraction
- Key name extraction
- Value parsing

Example usage:

analysis = Toml::Merge::FileAnalysis.new(
  source,
  backend: :citrus  # Uses tree_haver's Citrus backend
)

node = analysis.statements.first
node.table?      # => true (TOML-specific method)
node.table_name  # => "section" (TOML-specific)

Risk Mitigation

Risk: “What if we extract wrong?”

Mitigation: Stage 2 validation finds issues before extraction

Risk: “What if boundaries are unclear?”

Mitigation: Clear commenting during Stage 1, refined in Stage 2

Risk: “What if no one else uses Citrus?”

Mitigation: Still valuable for toml-merge portability

Risk: “What if performance is bad?”

Mitigation: Measure in Stage 2, optimize before extraction


Success Metrics

Stage 1 Success:

  • Citrus backend passes all toml-merge tests
  • Performance within 2x of tree-sitter
  • Clear generic/specific boundary documented

Stage 2 Success:

  • Used in production without issues
  • Edge cases identified and handled
  • Extraction plan documented

Stage 3 Success:

  • tree_haver has Citrus backend
  • toml-merge code reduced
  • All tests passing
  • Performance maintained

Stage 4 Success:

  • Documentation complete
  • Examples working
  • Other gems can adopt pattern
  • Community feedback positive

Timeline

Week 1-2:  Build in toml-merge
Week 3-4:  Validate & refine
Week 5-6:  Extract to tree_haver
Week 7:    Polish & document

Total: ~7 weeks to complete architecture


Conclusion

Staged approach is the clear winner:

  1. Low risk - validate before committing
  2. Fast start - no cross-gem coordination needed
  3. Right abstractions - learn before extracting
  4. Long-term benefits - ends with clean architecture
  5. Flexibility - can stop after Stage 1 if needed

Start building the Citrus backend in toml-merge NOW.

Extract to tree_haver once we’ve learned what truly belongs there.


Next Actions

  1. Create lib/toml/merge/backends/citrus/ directory
  2. Implement MatchWrapper (generic part)
  3. Implement TomlNodeAdapter (specific part)
  4. Add backend selection logic
  5. Write tests
  6. Measure performance

Let’s start with Stage 1!