toml-rb as Alternative Backend - Feasibility Analysis
Executive Summary
YES, toml-rb is HIGHLY FEASIBLE as an alternative backend!
The toml-rb gem (using Citrus parser) provides sufficient AST-like capabilities to serve as a pure-Ruby alternative to tree-sitter. We can implement a dual-backend architecture similar to tree_haver.
What toml-rb Provides
1. AST-like Parse Tree (via Citrus)
parsed = TomlRB::Document.parse(source)
# Returns: Citrus::Match with hierarchical match objects
The Citrus::Match objects provide:
-
Position tracking:
offsetandlengthproperties - Line calculation: Can compute line numbers from byte offsets
-
Text extraction: Access matched text via
stringmethod -
Hierarchical structure:
matchesarray for sequential traversal - Type information: Via TomlRB value objects (Table, TableArray, Keyvalue)
2. Semantic TOML Objects
TomlRB wraps Citrus matches with semantic objects:
-
TomlRB::Table - TOML sections
[section]-
@dotted_keys- array of key components -
full_key- complete dotted path
-
-
TomlRB::TableArray - Array of tables
[[items]]-
@dotted_keys- array of key components -
full_key- complete dotted path
-
-
TomlRB::Keyvalue - Key-value pairs
-
@dotted_keys- array of key components -
@value- parsed Ruby value (String, Integer, Array, Hash, etc.)
-
3. Position Information
Each Citrus::Match provides:
match.offset # Byte offset from start
match.length # Length in bytes
match.input.string # Full source text
match.string # Matched text
# Calculate line numbers:
start_line = match.input.string[0...match.offset].count("\n") + 1
end_line = match.input.string[0...(match.offset + match.length)].count("\n") + 1
4. Access to Sub-structures
match.captures # Hash of named captures from grammar
match.value # TomlRB semantic object (Table, Keyvalue, etc.)
Architecture Design
Following tree_haver’s pattern, we should implement:
lib/toml/merge/
backends/
tree_sitter.rb # Current implementation (wrap existing code)
citrus.rb # New toml-rb/citrus backend
backend_adapter.rb # Abstract interface
config.rb # Backend selection
Backend Selection Logic
module Toml
module Merge
class << self
def backend
@backend ||= ENV["TOML_MERGE_BACKEND"]&.to_sym || :auto
end
def backend=(name)
@backend = name&.to_sym
end
def backend_module
case backend
when :tree_sitter
Backends::TreeSitter
when :citrus
Backends::Citrus
else # :auto
# Prefer tree-sitter if available (better performance)
if Backends::TreeSitter.available?
Backends::TreeSitter
elsif Backends::Citrus.available?
Backends::Citrus
else
raise "No TOML parsing backend available"
end
end
end
end
end
end
Backend Interface
Each backend should implement:
module Toml::Merge::Backends
module BackendInterface
# Check if backend is available
def self.available?
# Parse source and return wrapped AST
def self.parse(source) # => FileAnalysis-compatible object
# Return capabilities
def self.capabilities
# { backend: :citrus, supports_comments: true, ... }
end
end
end
Node Wrapper Abstraction
Create a unified node wrapper that works with both backends:
# Current: NodeWrapper wraps TreeSitter::Node
# New: NodeWrapper can wrap either TreeSitter::Node OR Citrus::Match
class NodeWrapper
def initialize(node, backend:, **options)
@node = node
@backend = backend
@options = options
case backend
when :tree_sitter
@adapter = TreeSitterAdapter.new(node, **options)
when :citrus
@adapter = CitrusAdapter.new(node, **options)
end
end
# Delegate to adapter
def type; @adapter.type; end
def start_line; @adapter.start_line; end
def end_line; @adapter.end_line; end
def text; @adapter.text; end
def signature; @adapter.signature; end
# ... etc
end
Implementation Plan
Phase 1: Backend Infrastructure (Foundation)
- Create
lib/toml/merge/backends/directory - Create
backend_adapter.rbwith interface definition - Refactor existing code into
backends/tree_sitter.rb - Add backend selection logic to main module
- Add tests for backend switching
Phase 2: Citrus Backend (Core Implementation)
- Create
backends/citrus.rb - Implement
CitrusNodeAdapterto wrap Citrus::Match objects - Map Citrus structures to NodeWrapper interface:
- TomlRB::Table → table type
- TomlRB::TableArray → array_of_tables type
- TomlRB::Keyvalue → pair type
- Implement position tracking (line numbers from byte offsets)
- Add comment extraction from Citrus matches
Phase 3: FileAnalysis Integration
- Update
FileAnalysisto use backend system - Make parser initialization backend-aware
- Ensure signature generation works with both backends
Phase 4: Testing & Validation
- Add backend-specific specs
- Run full test suite against both backends
- Add integration tests comparing results
- Performance benchmarking (tree-sitter vs citrus)
Phase 5: Documentation & Polish
- Update README with backend options
- Document environment variables
- Add backend selection examples
- Update installation instructions
Benefits of Dual Backend
1. Platform Flexibility
- tree-sitter: Fast, but requires native libraries
- citrus: Pure Ruby, works everywhere (JRuby, TruffleRuby, limited environments)
2. Graceful Degradation
- Try tree-sitter first (performance)
- Fall back to citrus (compatibility)
- Users can force backend via
ENV["TOML_MERGE_BACKEND"]
3. Testing Coverage
- Test both implementations
- Catch backend-specific bugs
- Validate semantic correctness
4. Future-Proofing
- Easy to add more backends later
- Clear abstraction layer
- Follows tree_haver’s proven pattern
Risks & Mitigation
Risk 1: Performance Difference
Risk: Citrus may be slower than tree-sitter
Mitigation:
- Default to tree-sitter when available
- Benchmark both on real-world files
- Document performance characteristics
Risk 2: Semantic Differences
Risk: Backends might parse/represent TOML differently
Mitigation:
- Comprehensive test suite covering both backends
- Use TOML spec compliance tests
- Validate output equivalence
Risk 3: Maintenance Burden
Risk: Two backends = 2x maintenance
Mitigation:
- Strong abstraction layer minimizes duplication
- Shared test suite validates both
- Clear backend interface contract
Risk 4: Comment Handling
Risk: Comments might be harder to track in Citrus
Mitigation:
- Citrus matches include comment nodes
- Can extract via pattern matching
- May need special handling for inline comments
Code Organization
lib/toml/merge/
├── backends/
│ ├── backend_adapter.rb # Abstract interface
│ ├── tree_sitter.rb # Existing tree-sitter implementation
│ ├── citrus.rb # New toml-rb/citrus implementation
│ └── adapters/
│ ├── tree_sitter_node.rb # TreeSitter::Node adapter
│ └── citrus_match.rb # Citrus::Match adapter
├── config.rb # Backend selection
├── file_analysis.rb # Updated to use backends
├── node_wrapper.rb # Updated to use adapters
└── ...
spec/toml/merge/
├── backends/
│ ├── tree_sitter_spec.rb
│ ├── citrus_spec.rb
│ └── shared_examples.rb # Shared behavior tests
└── ...
Conclusion
toml-rb with Citrus is DEFINITELY a viable alternative backend.
The Citrus parse tree provides all necessary information:
- ✅ Node types and structure
- ✅ Position information (with calculation)
- ✅ Text extraction
- ✅ Hierarchical traversal
- ✅ Semantic type information
Recommendation: Proceed with dual-backend implementation following tree_haver’s architecture pattern. This will give toml-merge maximum flexibility, broader platform support, and a more robust codebase.
The main work is creating the abstraction layer and adapter classes, but the underlying data is absolutely sufficient for our needs.