toml-rb as Alternative Backend - Feasibility Analysis

Executive Summary

YES, toml-rb is HIGHLY FEASIBLE as an alternative backend!

The toml-rb gem (using Citrus parser) provides sufficient AST-like capabilities to serve as a pure-Ruby alternative to tree-sitter. We can implement a dual-backend architecture similar to tree_haver.

What toml-rb Provides

1. AST-like Parse Tree (via Citrus)

parsed = TomlRB::Document.parse(source)
# Returns: Citrus::Match with hierarchical match objects

The Citrus::Match objects provide:

  • Position tracking: offset and length properties
  • Line calculation: Can compute line numbers from byte offsets
  • Text extraction: Access matched text via string method
  • Hierarchical structure: matches array for sequential traversal
  • Type information: Via TomlRB value objects (Table, TableArray, Keyvalue)

2. Semantic TOML Objects

TomlRB wraps Citrus matches with semantic objects:

  • TomlRB::Table - TOML sections [section]
    • @dotted_keys - array of key components
    • full_key - complete dotted path
  • TomlRB::TableArray - Array of tables [[items]]
    • @dotted_keys - array of key components
    • full_key - complete dotted path
  • TomlRB::Keyvalue - Key-value pairs
    • @dotted_keys - array of key components
    • @value - parsed Ruby value (String, Integer, Array, Hash, etc.)

3. Position Information

Each Citrus::Match provides:

match.offset          # Byte offset from start
match.length          # Length in bytes
match.input.string    # Full source text
match.string          # Matched text

# Calculate line numbers:
start_line = match.input.string[0...match.offset].count("\n") + 1
end_line = match.input.string[0...(match.offset + match.length)].count("\n") + 1

4. Access to Sub-structures

match.captures        # Hash of named captures from grammar
match.value          # TomlRB semantic object (Table, Keyvalue, etc.)

Architecture Design

Following tree_haver’s pattern, we should implement:

lib/toml/merge/
  backends/
    tree_sitter.rb      # Current implementation (wrap existing code)
    citrus.rb           # New toml-rb/citrus backend
  backend_adapter.rb    # Abstract interface
  config.rb            # Backend selection

Backend Selection Logic

module Toml
  module Merge
    class << self
      def backend
        @backend ||= ENV["TOML_MERGE_BACKEND"]&.to_sym || :auto
      end
      
      def backend=(name)
        @backend = name&.to_sym
      end
      
      def backend_module
        case backend
        when :tree_sitter
          Backends::TreeSitter
        when :citrus
          Backends::Citrus
        else # :auto
          # Prefer tree-sitter if available (better performance)
          if Backends::TreeSitter.available?
            Backends::TreeSitter
          elsif Backends::Citrus.available?
            Backends::Citrus
          else
            raise "No TOML parsing backend available"
          end
        end
      end
    end
  end
end

Backend Interface

Each backend should implement:

module Toml::Merge::Backends
  module BackendInterface
    # Check if backend is available
    def self.available?
    
    # Parse source and return wrapped AST
    def self.parse(source)  # => FileAnalysis-compatible object
    
    # Return capabilities
    def self.capabilities
      # { backend: :citrus, supports_comments: true, ... }
    end
  end
end

Node Wrapper Abstraction

Create a unified node wrapper that works with both backends:

# Current: NodeWrapper wraps TreeSitter::Node
# New:     NodeWrapper can wrap either TreeSitter::Node OR Citrus::Match

class NodeWrapper
  def initialize(node, backend:, **options)
    @node = node
    @backend = backend
    @options = options
    
    case backend
    when :tree_sitter
      @adapter = TreeSitterAdapter.new(node, **options)
    when :citrus
      @adapter = CitrusAdapter.new(node, **options)
    end
  end
  
  # Delegate to adapter
  def type; @adapter.type; end
  def start_line; @adapter.start_line; end
  def end_line; @adapter.end_line; end
  def text; @adapter.text; end
  def signature; @adapter.signature; end
  # ... etc
end

Implementation Plan

Phase 1: Backend Infrastructure (Foundation)

  1. Create lib/toml/merge/backends/ directory
  2. Create backend_adapter.rb with interface definition
  3. Refactor existing code into backends/tree_sitter.rb
  4. Add backend selection logic to main module
  5. Add tests for backend switching

Phase 2: Citrus Backend (Core Implementation)

  1. Create backends/citrus.rb
  2. Implement CitrusNodeAdapter to wrap Citrus::Match objects
  3. Map Citrus structures to NodeWrapper interface:
    • TomlRB::Table → table type
    • TomlRB::TableArray → array_of_tables type
    • TomlRB::Keyvalue → pair type
  4. Implement position tracking (line numbers from byte offsets)
  5. Add comment extraction from Citrus matches

Phase 3: FileAnalysis Integration

  1. Update FileAnalysis to use backend system
  2. Make parser initialization backend-aware
  3. Ensure signature generation works with both backends

Phase 4: Testing & Validation

  1. Add backend-specific specs
  2. Run full test suite against both backends
  3. Add integration tests comparing results
  4. Performance benchmarking (tree-sitter vs citrus)

Phase 5: Documentation & Polish

  1. Update README with backend options
  2. Document environment variables
  3. Add backend selection examples
  4. Update installation instructions

Benefits of Dual Backend

1. Platform Flexibility

  • tree-sitter: Fast, but requires native libraries
  • citrus: Pure Ruby, works everywhere (JRuby, TruffleRuby, limited environments)

2. Graceful Degradation

  • Try tree-sitter first (performance)
  • Fall back to citrus (compatibility)
  • Users can force backend via ENV["TOML_MERGE_BACKEND"]

3. Testing Coverage

  • Test both implementations
  • Catch backend-specific bugs
  • Validate semantic correctness

4. Future-Proofing

  • Easy to add more backends later
  • Clear abstraction layer
  • Follows tree_haver’s proven pattern

Risks & Mitigation

Risk 1: Performance Difference

Risk: Citrus may be slower than tree-sitter
Mitigation:

  • Default to tree-sitter when available
  • Benchmark both on real-world files
  • Document performance characteristics

Risk 2: Semantic Differences

Risk: Backends might parse/represent TOML differently
Mitigation:

  • Comprehensive test suite covering both backends
  • Use TOML spec compliance tests
  • Validate output equivalence

Risk 3: Maintenance Burden

Risk: Two backends = 2x maintenance
Mitigation:

  • Strong abstraction layer minimizes duplication
  • Shared test suite validates both
  • Clear backend interface contract

Risk 4: Comment Handling

Risk: Comments might be harder to track in Citrus
Mitigation:

  • Citrus matches include comment nodes
  • Can extract via pattern matching
  • May need special handling for inline comments

Code Organization

lib/toml/merge/
├── backends/
│   ├── backend_adapter.rb        # Abstract interface
│   ├── tree_sitter.rb            # Existing tree-sitter implementation
│   ├── citrus.rb                 # New toml-rb/citrus implementation
│   └── adapters/
│       ├── tree_sitter_node.rb   # TreeSitter::Node adapter
│       └── citrus_match.rb       # Citrus::Match adapter
├── config.rb                     # Backend selection
├── file_analysis.rb              # Updated to use backends
├── node_wrapper.rb               # Updated to use adapters
└── ...

spec/toml/merge/
├── backends/
│   ├── tree_sitter_spec.rb
│   ├── citrus_spec.rb
│   └── shared_examples.rb        # Shared behavior tests
└── ...

Conclusion

toml-rb with Citrus is DEFINITELY a viable alternative backend.

The Citrus parse tree provides all necessary information:

  • ✅ Node types and structure
  • ✅ Position information (with calculation)
  • ✅ Text extraction
  • ✅ Hierarchical traversal
  • ✅ Semantic type information

Recommendation: Proceed with dual-backend implementation following tree_haver’s architecture pattern. This will give toml-merge maximum flexibility, broader platform support, and a more robust codebase.

The main work is creating the abstraction layer and adapter classes, but the underlying data is absolutely sufficient for our needs.