Skip to content

OpenTelemetry Integration ​

Logly.zig provides comprehensive OpenTelemetry (OTEL) support for distributed tracing, metrics collection, and observability integration with external monitoring systems.

Overview ​

OpenTelemetry is a collection of tools, APIs, and SDKs used to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software's performance and behavior.

Logly's telemetry module integrates with:

  • utils.zig: ID generation, time utilities, sampling, JSON escaping
  • async.zig: Non-blocking span export with ring buffer
  • network.zig: TCP/UDP/Syslog transport for remote exporters
  • thread_pool.zig: Parallel span processing for high throughput

Key Features ​

FeatureDescription
Multiple ProvidersJaeger, Zipkin, Datadog, Google Cloud, AWS X-Ray, Azure, and more
W3C Trace ContextFull W3C traceparent/tracestate header support
W3C BaggageContext propagation across service boundaries
Metrics CollectionCounter, gauge, histogram, and summary metrics
Export ModesSync, async buffer, batch, and network export
Custom ProvidersBuild your own exporter with callback interface
Sampling Strategiesalways_on, always_off, trace_id_ratio, parent_based
Thread SafetyMutex-protected concurrent span/metric access

Getting Started ​

Basic Setup ​

zig
const std = @import("std");
const logly = @import("logly");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    // Disable update checker
    logly.UpdateChecker.setEnabled(false);

    // Configure telemetry with Jaeger
    var config = logly.TelemetryConfig.jaeger();
    config.service_name = "my-service";
    config.service_version = "1.0.0";
    config.environment = "production";

    var telemetry = try logly.Telemetry.init(allocator, config);
    defer telemetry.deinit();

    // Your tracing code here
}

Creating Spans ​

Spans represent a single unit of work in a distributed trace:

zig
var span = try telemetry.startSpan("operation_name", .{
    .kind = .server, // .client, .internal, .producer, .consumer
});
defer span.deinit();

// Add attributes
try span.setAttribute("http.method", .{ .string = "POST" });
try span.setAttribute("http.status_code", .{ .integer = 200 });

// Add events (timestamped annotations)
try span.addEvent("validation_completed", null);

// Set final status
span.setStatus(.ok);

// End and export span
span.end();
try telemetry.endSpan(&span);
try telemetry.exportSpans();

Supported Providers ​

Built-in Providers ​

ProviderFactory FunctionDefault Endpoint
JaegerTelemetryConfig.jaeger()http://localhost:6831
ZipkinTelemetryConfig.zipkin()http://localhost:9411/api/v2/spans
DatadogTelemetryConfig.datadog(api_key)http://localhost:8126/v0.3/traces
Google CloudTelemetryConfig.googleCloud(project, key)https://cloudtrace.googleapis.com/v2
Google Analytics 4TelemetryConfig.googleAnalytics(id, secret)https://www.google-analytics.com/mp/collect
Google Tag ManagerTelemetryConfig.googleTagManager(url, key)Custom GTM endpoint
AWS X-RayTelemetryConfig.awsXray(region)http://localhost:2000
AzureTelemetryConfig.azure(connection_string)Application Insights endpoint
OTEL CollectorTelemetryConfig.otelCollector(endpoint)Custom endpoint
FileTelemetryConfig.file(path)Local JSONL file

Custom Provider ​

Create your own exporter implementation:

zig
fn myCustomExporter() anyerror!void {
    // Custom export logic - send to your own backend
    std.debug.print("Custom export triggered\n", .{});
    
    // Example: Send to custom HTTP endpoint
    // const client = try std.http.Client.init(allocator);
    // defer client.deinit();
    // ...
}

var config = logly.TelemetryConfig.custom(&myCustomExporter);
config.service_name = "custom-service";
config.batch_size = 100;
config.batch_timeout_ms = 5000;

W3C Standards ​

Trace Context (traceparent) ​

Generate and parse W3C traceparent headers:

zig
// Generate header for outgoing request
var span = try telemetry.startSpan("outbound_call", .{});
const traceparent = try telemetry.getTraceparentHeader(&span);
defer allocator.free(traceparent);
// Format: 00-{trace_id}-{span_id}-{flags}
// Example: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01

// Parse incoming header
const incoming = "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01";
if (logly.Telemetry.parseTraceparentHeader(incoming)) |ctx| {
    // ctx.trace_id, ctx.span_id, ctx.sampled
}

Baggage ​

Propagate arbitrary context across services:

zig
var baggage = logly.Baggage.init(allocator);
defer baggage.deinit();

try baggage.set("user_id", "user-123");
try baggage.set("tenant_id", "tenant-456");

// Serialize for HTTP header
const header = try baggage.toHeaderValue(allocator);
defer allocator.free(header);
// Output: "user_id=user-123,tenant_id=tenant-456"

// Parse incoming baggage
var parsed = try logly.Baggage.fromHeaderValue(allocator, header);
defer parsed.deinit();

Parent-Child Relationships ​

Create nested spans with automatic context propagation:

zig
// Parent span (creates new trace)
var parent = try telemetry.startSpan("http_handler", .{ .kind = .server });
defer parent.deinit();

// Child span (inherits trace_id from parent)
var child = try telemetry.startSpanWithContext("database_query", &parent, .{
    .kind = .client,
});
defer child.deinit();

// Grandchild span
var grandchild = try telemetry.startSpanWithContext("cache_lookup", &child, .{
    .kind = .internal,
});
defer grandchild.deinit();

// All spans share the same trace_id
// child.parent_span_id == parent.span_id
// grandchild.parent_span_id == child.span_id

Metrics ​

Recording Metrics ​

zig
// Full API with options
try telemetry.recordMetric("http.request.duration_ms", 123.45, .{
    .kind = .gauge,
    .unit = "ms",
    .description = "HTTP request duration",
});

// Convenience methods
try telemetry.recordCounter("requests.total", 1.0);
try telemetry.recordGauge("cpu.usage", 45.5);
try telemetry.recordHistogram("response.latency", 123.4);

// Export metrics
try telemetry.exportMetrics();

Metric Kinds ​

KindDescriptionUse Case
counterMonotonically increasingRequest counts, errors
gaugePoint-in-time valueCPU usage, memory
histogramValue distributionResponse times
summaryAggregated statisticsPercentiles

Sampling Strategies ​

Control which traces are recorded:

zig
// Always sample (development)
config.sampling_strategy = .always_on;

// Never sample (disabled)
config.sampling_strategy = .always_off;

// Sample percentage (production)
config.sampling_strategy = .trace_id_ratio;
config.sampling_rate = 0.1; // 10% of traces

// Follow parent's decision (distributed)
config.sampling_strategy = .parent_based;

Callbacks ​

Monitor telemetry events:

zig
var config = logly.TelemetryConfig.file("spans.jsonl");

config.on_span_start = struct {
    fn callback(span_id: []const u8, name: []const u8) void {
        std.debug.print("Span started: {s}\n", .{name});
    }
}.callback;

config.on_span_end = struct {
    fn callback(span_id: []const u8, duration_ns: u64) void {
        const ms = @as(f64, @floatFromInt(duration_ns)) / 1_000_000.0;
        std.debug.print("Span ended: {d:.2}ms\n", .{ms});
    }
}.callback;

config.on_metric_recorded = struct {
    fn callback(name: []const u8, value: f64) void {
        std.debug.print("Metric: {s} = {d}\n", .{name, value});
    }
}.callback;

config.on_error = struct {
    fn callback(msg: []const u8) void {
        std.debug.print("Error: {s}\n", .{msg});
    }
}.callback;

Export Statistics ​

Monitor exporter performance:

zig
const stats = telemetry.getExporterStats();

std.debug.print("Spans exported: {}\n", .{stats.getSpansExported()});
std.debug.print("Bytes sent: {}\n", .{stats.getBytesExported()});
std.debug.print("Export errors: {}\n", .{stats.getExportErrors()});
std.debug.print("Error rate: {d:.2}%\n", .{stats.getErrorRate() * 100.0});

Runtime Control ​

Control telemetry behavior at runtime without reinitializing:

Enable/Disable Telemetry ​

zig
// Check current state
if (telemetry.isEnabled()) {
    std.debug.print("Telemetry is active\n", .{});
}

// Disable temporarily
telemetry.setEnabled(false);

// Spans created while disabled are no-ops
const span = try telemetry.startSpan("test", .{});
std.debug.assert(span.isEmpty()); // true

// Re-enable
telemetry.setEnabled(true);

Update Resource Configuration ​

zig
// Get current resource
const resource = telemetry.getResource();

// Update at runtime
telemetry.setResource(.{
    .service_name = "updated-service",
    .service_version = "2.0.0",
    .environment = "staging",
    .datacenter = "us-west-2",
});

Reset Statistics ​

zig
// Clear all counters
telemetry.resetStats();

Distributed Trace Continuation ​

Create spans from incoming W3C traceparent headers:

zig
// Create child span from incoming HTTP request
var span = try telemetry.startSpanFromTraceparent(
    "handle_request",
    request.getHeader("traceparent") orelse "",
    .{ .kind = .server },
);
defer span.deinit();

// If traceparent is invalid, creates a new root span
// If not sampled (flags=00), returns empty span

Best Practices ​

  1. Always set service identity: Configure service_name, service_version, and environment
  2. Use sampling in production: Set sampling_rate to 0.01-0.1 for high-throughput services
  3. Propagate context: Use W3C traceparent/baggage headers for cross-service tracing
  4. Handle span cleanup: Always use defer span.deinit() to avoid memory leaks
  5. Export regularly: Call exportSpans() periodically or at request boundaries
  6. Monitor export stats: Check getExporterStats() to detect export issues

Configuration Reference ​

See API Reference: Telemetry for complete configuration options.

Examples ​

See examples/telemetry.zig for a complete working example.

Released under the MIT License.