MiroxMirox
  • Platform

    • Philosophy
    • Platform Overview
    • Platform Resources
  • Mirox-Cloud

    • Cloud Overview
    • Connected Microservices
  • Mirox-Agent

    • Agent Overview
    • Deployment Options
    • Data Scraper
    • Digital Twin
  • Technical Details

    • Metric Collection
  • Information

    • Supported Plants
  • Plant Types

    • Solar Plants
    • Wind Plants
    • Battery Storage
  • Monitoring & Visualization

    • Real-time Monitoring
    • Digital Twin
    • Component States
    • Loss Detection
    • Efficiency Detection
    • KPI Dashboard
  • Data Management

    • Events
    • Tickets
    • Forecasts
    • Reports
  • Integration & Sharing

    • Cooperations
    • API Tokens
    • VPN
    • Proxy
  • AI

    • AI Assistant & Wizards
    • Agentic Access (MCP)
  • Billing

    • Market & Tariffs
    • Accounting & Billing
  • Collaboration

    • Invitations
  • Security

    • Authentication
    • Permission System
    • Cooperation Restrictions
    • Access Audit Logging
  • Nodes

    • mrxnode
  • Application

    • Door Control
    • Generic Relay
  • Edge Cluster

    • Orchestration
  • Getting Started

    • First Steps
  • Personal

    • Using the VPN
    • Using the Proxy
    • Two-Factor Authentication
    • Sessions
    • API Tokens
  • Per Park

    • Contacts
    • Network Devices
    • Data Loggers
    • Components
    • Direct VPN (per Agent)
  • Organization

    • Member Permissions
    • Cooperations
    • File Storage
  • Data Export

    • Export Metric API
    • MiroxQL Query Language
    • External Report Generation
    • Grafana
    • API Overview
  • Support

    • Request Integration Guide
  • mrxnode

    • Overview
    • How-To Guide
    • Container Deployment
    • Command Cheatsheet
    • Troubleshooting
  • Reporting

    • External Report Generator
  • English
  • Deutsch
  • Español
  • Français
  • Português
  • Italiano
  • English
  • Platform

    • Philosophy
    • Platform Overview
    • Platform Resources
  • Mirox-Cloud

    • Cloud Overview
    • Connected Microservices
  • Mirox-Agent

    • Agent Overview
    • Deployment Options
    • Data Scraper
    • Digital Twin
  • Technical Details

    • Metric Collection
  • Information

    • Supported Plants
  • Plant Types

    • Solar Plants
    • Wind Plants
    • Battery Storage
  • Monitoring & Visualization

    • Real-time Monitoring
    • Digital Twin
    • Component States
    • Loss Detection
    • Efficiency Detection
    • KPI Dashboard
  • Data Management

    • Events
    • Tickets
    • Forecasts
    • Reports
  • Integration & Sharing

    • Cooperations
    • API Tokens
    • VPN
    • Proxy
  • AI

    • AI Assistant & Wizards
    • Agentic Access (MCP)
  • Billing

    • Market & Tariffs
    • Accounting & Billing
  • Collaboration

    • Invitations
  • Security

    • Authentication
    • Permission System
    • Cooperation Restrictions
    • Access Audit Logging
  • Nodes

    • mrxnode
  • Application

    • Door Control
    • Generic Relay
  • Edge Cluster

    • Orchestration
  • Getting Started

    • First Steps
  • Personal

    • Using the VPN
    • Using the Proxy
    • Two-Factor Authentication
    • Sessions
    • API Tokens
  • Per Park

    • Contacts
    • Network Devices
    • Data Loggers
    • Components
    • Direct VPN (per Agent)
  • Organization

    • Member Permissions
    • Cooperations
    • File Storage
  • Data Export

    • Export Metric API
    • MiroxQL Query Language
    • External Report Generation
    • Grafana
    • API Overview
  • Support

    • Request Integration Guide
  • mrxnode

    • Overview
    • How-To Guide
    • Container Deployment
    • Command Cheatsheet
    • Troubleshooting
  • Reporting

    • External Report Generator
  • English
  • Deutsch
  • Español
  • Français
  • Português
  • Italiano
  • English
  • Platform

    • Platform Philosophy
    • Platform Overview
    • Platform Resources
  • Mirox-Cloud

    • Cloud Overview
    • Connected Microservices
  • Mirox-Agent

    • Mirox-Agent
    • Agent Deployment Options
    • Data Scraper
    • Digital Twin
  • Technical Details

    • Metric Collection

Data Scraper

The Data Scraper is the core data collection engine within the Mirox-Agent, actively retrieving real-time information from all monitored equipment at your plant. It connects to your loggers, inverters, meters and battery systems through a library of vendor-specific adapters, normalizes everything into one consistent metric vocabulary, and forwards the result to the rest of the platform — while running a growing set of edge analytics (performance ratio, curtailment tracking, clear-sky baselines, forecasting and network monitoring) directly at the plant.

Purpose and Role

The Data Scraper serves a single, focused purpose: actively collect raw measurements from equipment and forward them for processing. It acts as the bridge between diverse manufacturer equipment and the unified Mirox platform, translating proprietary data formats into standardized metrics.

Core Responsibilities:

  • Connect to data loggers and monitoring devices through vendor-specific adapters
  • Retrieve raw measurements on configurable schedules
  • Transform manufacturer-specific data into standardized metric format
  • Automatically discover and track installation components
  • Monitor component activity and operational status
  • Run edge analytics (expected power, performance ratio, curtailment, clear-sky and forecasts)
  • Inspect the plant's local network and audit device access
  • Forward metrics to the Time-Series Database and Digital Twin service
  • Report component health and operational status to the IoT Cloud

This separation of concerns keeps the Data Scraper lightweight, focused, and independently deployable.

Architecture Overview

The Data Scraper operates as an asynchronous, event-driven service where multiple data collection tasks run concurrently:

Key Architectural Principles:

  • Stateless: No persistent state between restarts
  • Adapter-Based: A dedicated, vendor-specific adapter speaks each device's protocol
  • Self-Healing: Automatic error recovery with exponential backoff
  • Concurrent: Each data source collected independently
  • Edge-Deployed: Runs on or near the plant, close to the equipment it reads

Network Access Requirements

The Data Scraper requires direct TCP/IP network access to data sources for communication. This typically means:

  • Direct Ethernet/WiFi connectivity to the device's IP address
  • Open network ports for the device's protocol (e.g., TCP 80/443 for HTTP and WebSocket APIs)
  • Proper network routing between the Data Scraper host and devices

If direct network access is not possible (e.g., isolated OT networks, air-gapped systems, serial-only devices), we may need to implement an intermediate data collector such as:

  • Third-party data logger with network connectivity
  • Protocol gateway (Serial-to-Ethernet, fieldbus bridge, etc.)
  • Custom hardware solution for specialized interfaces

Consult with our engineering team to evaluate connectivity options for your specific installation.

The Adapter System

Device-specific protocols

Adapter

Standardized data format

Concept

An adapter is a vendor-specific connector module that knows how to communicate with one particular family of device. Each adapter is hand-built and reverse-engineered against that device's own web, API, or database interface — there is no single "speaks-any-protocol" engine. The adapter system is the core extensibility mechanism: supporting a new device means adding a new adapter, which we do on request (see below).

Each adapter is a self-contained module responsible for:

  1. Connection Management - Establishing and maintaining communication
  2. Data Retrieval - Fetching measurements using the appropriate protocol
  3. Data Transformation - Converting to standardized metric format

All adapters inherit from a base class that provides health monitoring, automatic retry logic, exponential backoff on failures, metric validation, and status reporting to the IoT platform.

Health Management

Each adapter implements an automatic state machine:

  • INITIALIZING: Starting up and establishing initial connections
  • HEALTHY: Operating normally with successful data collections
  • UNHEALTHY: Experiencing errors but attempting to continue
  • RECONNECTING: Performing recovery actions after repeated failures
  • FROZEN: The device is returning stale data — the same values repeat, or no new reading has arrived within the expected window
  • PAUSED: Temporarily paused by user command; resumes automatically when the pause expires

The system automatically transitions between states, reports status to the platform, and attempts recovery without manual intervention. The FROZEN state is what lets the platform tell a genuinely-down logger apart from one that is merely repeating a stuck value.

Supported Devices

The Data Scraper ships with around fifteen vendor-specific adapters, each built for a particular device family. The list below reflects what is supported today and grows whenever a new device is integrated.

Device familyWhat it isHow it is read
Bluelog data loggerMeteocontrol-style logger (sensors + strings)HTTP login plus a live WebSocket feed
SMA Sunny CentralCentral inverter controllerVendor HTTP API (with shutdown detection)
SMA Power ManagerSMA plant controllerVendor HTTP API
Sungrow loggerSungrow inverter data loggerLive WebSocket connection
Huawei SmartLoggerSmartLogger 1000 / 3000 / 4000HTTP web interface (zero-config onboarding)
Janitza metersPower-quality metersHTTP, no credentials (zero-config onboarding)
Phoenix Contact PLCPLCnext / SPS controllerVendor HTTPS REST API (zero-config onboarding)
Dexcon controllerPlant controllerVendor HTTPS REST API
ZebotecInverters and sensorsVendor HTTP API
FREQCON BESSBattery storage systemVendor HTTP plus a time-series query interface
PRTGNetwork-monitoring serverPRTG HTTP API
Becker historianHistorian database (Microsoft SQL Server)Microsoft SQL Server connection
Object storage / filesS3 or S3-compatible storage and local filesFile scanning with CSV/Excel parsing and gap detection
Weather modelOpen-Meteo weather + on-device PV power modelHTTP (clear-sky and irradiance modelling)

Real Transports in Use

Across these adapters, the actual communication methods are vendor HTTP/HTTPS APIs (most common), live WebSocket connections (Bluelog and Sungrow), Microsoft SQL Server (the Becker historian), S3 / file access, and a time-series query interface used only by the FREQCON battery adapter. There is no generic Modbus, MQTT, FTP/SFTP, or WebDAV ingestion — each integration is purpose-built for its device.

Creating New Adapters

New adapters can be developed to support additional devices or protocols. The modular design and base class functionality significantly reduce development time.

Compatibility with Legacy Equipment: We can create adapters for older devices that were never specifically designed for data export. As long as the device provides its data in any accessible way—whether through a REST API, web interface, database, file system, or any other mechanism—we can extract and integrate that data into the platform.

Unrestricted Data Collection: Our adapters are not limited to the pre-defined data-export formats that data loggers typically provide. We can collect any data that the device makes available, going beyond the standard set of metrics a manufacturer's logger might expose. If a device has additional diagnostic information, advanced parameters, or hidden data points accessible through its interface, we can retrieve and standardize them.

Custom Adapters on Demand

We can create new adapters for virtually any data source at any time upon customer request. The adapter system is designed for rapid extensibility—new protocol support can typically be implemented within days depending on complexity. If you have equipment from a manufacturer not yet supported, contact us to discuss custom adapter development.

No Vendor Documentation Required

Adapter development does not strictly require vendor API documentation. Through network traffic analysis, protocol reverse engineering (where legally permitted), and empirical testing, we can often create functional adapters even for devices with undocumented interfaces. This capability is particularly valuable for legacy equipment or systems with proprietary protocols.

Onboarding Through the Platform

For a subset of devices, you can bring a logger online from the platform without hand-writing any configuration. An onboarding wizard asks the agent to dry-run-connect to the device and streams the live probe results back to you, so you see immediately whether the connection works before committing it. There are two flavours:

  • Zero-config onboarding — the adapter already owns the device's full reading set, so the wizard just shows a read-only live preview and you save. Available today for Janitza meters, Huawei SmartLogger, and Phoenix Contact controllers. Janitza needs no credentials at all.
  • Interactive mapping for generic loggers — some loggers expose arbitrary raw values the platform can't interpret on its own. For these the wizard asks the agent to enumerate every raw value the device exposes (group, name, unit, live sample), the operator maps each one to a known metric, and a mapping-driven dry run previews the exact metrics that would be produced before saving. QReader is the first generic logger supported this way.

Not Universal Plug-and-Play

Only the device families above are wizard-onboardable today (the three zero-config families plus generic loggers like QReader). All other adapters still require a per-device configuration delivered with the agent, so treat onboarding automation as device-specific rather than universal.

Metric Standardization

All collected data is transformed into a standardized metric format defined by the platform's metric taxonomy. This ensures consistency across all data sources and enables unified processing downstream.

Metric Structure

Each metric follows a standardized structure compatible with modern time-series databases:

Components:

  • Name: Standardized metric identifier from predefined taxonomy
  • Value: Numeric measurement in base SI units
  • Labels: Key-value pairs for component identification and grouping
  • Timestamp: Optional preservation of original device timestamp

Standard Labels:

  • Source adapter type and instance number
  • Human-readable names
  • Component identifiers (inverter ID, string number, etc.)
  • Physical location or grouping information

Unit Conventions

All metrics use base SI units regardless of what the manufacturer's device reports:

  • Power: Watts (W)
  • Energy: Watt-hours (Wh)
  • Voltage: Volts (V)
  • Current: Amperes (A)
  • Temperature: Celsius (°C)
  • Irradiance: Watts per square meter (W/m²)

Adapters automatically convert from manufacturer-specific units (kW, MWh, etc.) to these standards during the transformation phase.

Metric Categories

The platform defines 376 standardized metric types organized into 11 families:

FamilyCountWhat it covers
Powerplant156Grid, AC output, inverters, combiner boxes, strings, irradiation
Battery86Battery box, storage, module and cell measurements
Weather46Weather inputs and measurements
Weather Model16Modelled PV production from weather
Network SNMP16SNMP readings from network devices
Agent19Agent self-telemetry and health
Network Monitor11Local-network monitoring measurements
AI Usage11AI feature usage at the edge
Operator7Operator-fleet telemetry
Network4Basic connectivity
Scraper4Data Scraper self-metrics

Label expansions (per-string, per-phase, per-inverter, and so on) multiply these into far more individual time series at a real plant. For the complete metric taxonomy and definitions, see Metric Collection.

Data Collection Flow

Polling Strategies

Adapters support two polling modes:

Interval-Based (default): Executes every N seconds after the previous collection completes. Simple and responsive to varying collection durations.

Static Time-Based: Executes at fixed intervals from midnight with optional offset (e.g., at 00:01, 05:01, 10:01 for 5-minute intervals with 1-minute offset). Useful for alignment with external systems.

Processing Pipeline

After collection, metrics pass through several processing stages:

Metric Preparation: Source labels are added, timestamps applied, and structure validated.

Filtering: Configured filters can modify values, validate ranges, or skip metrics based on rules.

Calculations: Automated calculators derive additional metrics:

  • Solar radiation power integrated to irradiation energy
  • String voltage × current calculated to power
  • Power values integrated to energy over time

Component Discovery: As metrics flow through, the Data Scraper automatically discovers and identifies installation components. This is a crucial feature—since the Data Scraper is the layer that actively collects data, it inherently knows which components exist and are providing data. The system automatically discovers:

  • Inverters (from inverter power metrics)
  • String combiner boxes / GAKs (from GAK metrics)
  • Individual strings (from string voltage/current metrics)
  • Irradiation sensors (from radiation metrics)
  • Grid connection points (from grid energy metrics)

Discovered components are synchronized to the IoT platform for inventory management, creating a real-time, self-maintaining equipment registry without manual configuration.

Component Activity Tracking

Because the Data Scraper continuously polls data sources, it can detect which components are actively providing data and which have stopped responding. When a component stops sending metrics, the Data Scraper marks it as inactive in the IoT Cloud. When it resumes, it's automatically marked active again. This provides real-time awareness of equipment operational status—not just whether the Data Scraper can reach the data logger, but whether individual components within the installation are functioning and reporting data.

Production Detection: The system monitors plant operational state:

  • Detects when production begins based on irradiance and power
  • Identifies unexpected shutdowns during production hours
  • Reports state transitions for alerting

Metric Grouping: Metrics are batched by time series to optimize database insertion performance.

Network Traffic Optimization

After metric grouping and batching, the Data Scraper applies additional compression before transmitting data to the Mirox-Cloud. This significantly reduces network traffic volume, which is particularly beneficial when internet bandwidth is limited or metered. For more details on bandwidth considerations, see On-Site Deployment.

Data Export

Processed metrics are forwarded to two destinations:

Time-Series Database: Metrics are pushed in batches with rate limiting and retry logic for long-term storage and historical querying.

Digital Twin Webhook: A separate background task continuously forwards the latest metric values to the Digital Twin service (a completely separate microservice) for real-time analysis. The Data Scraper has no knowledge of what the Digital Twin does with the data—it simply provides the metrics. For information about Digital Twin processing, see Digital Twin.

Stateless Operation

The Data Scraper is designed to be completely stateless:

  • No persistent storage—all state exists only in memory
  • Can be stopped and restarted without data loss
  • Multiple instances can run independently for different parks
  • Each polling cycle is independent of previous cycles
  • Crash-resilient with no risk of corrupting persistent state

The only persistent state exists externally:

  • Configuration files (version-controlled)
  • Time-Series Database (external system)
  • IoT Cloud component registry (external system)

This design ensures operational simplicity, reliability, and easy horizontal scaling.

Separation of Concerns

The Data Scraper has a narrow, focused responsibility that enables clear separation from other platform components:

Data Scraper:

  • Collects raw measurements from equipment
  • Transforms data to standard format
  • Discovers and tracks components
  • Monitors component activity status
  • Forwards metrics to other services

Digital Twin: Validates against physics models and detects anomalies and losses

Time-Series Database: Stores historical data, provides query interface

IoT Cloud: Maintains component registry, tracks device status, manages equipment inventory

This separation enables independent development, testing, deployment, and scaling of each component while ensuring each service focuses on its core competency.

Advanced Features

Automatic Health Monitoring

Each adapter implements a state machine that tracks operational health with automatic reporting to the platform and exposure via the metrics API for operational monitoring.

Automatic Component Discovery

The Data Scraper's position as the data collection layer gives it a unique advantage: it inherently knows which components exist at an installation because it directly interacts with the metrics they produce. As metrics flow through the system, components are automatically discovered from metric labels and registered with the IoT platform.

Discovery Process:

  1. Metrics arrive with identifying labels (inverter ID, string number, sensor location, etc.)
  2. The Data Scraper extracts component information from these labels
  3. New components are automatically registered with the IoT Cloud
  4. Component metadata (type, identifier, location) is synchronized
  5. The platform maintains an up-to-date equipment inventory without manual entry

This self-discovery mechanism ensures the platform always knows what equipment exists at the installation, eliminating the need for manual configuration and reducing deployment time.

Production State Detection

The service monitors plant operational state and detects production starts, unexpected shutdowns during production hours, and state transitions for alerting and analysis, reporting only when the state actually changes. It also watches for overproduction — output above the clear-sky model for a sustained period — which can flag a logger that is returning frozen values and, in that case, fall back to the clear-sky model so the data stream stays sensible. This provides real-time operational awareness beyond just raw measurements.

Calculated Metrics

Several calculators automatically derive metrics from raw measurements—solar radiation integrated to irradiation energy, string power calculated from voltage and current, and power values integrated to energy over time. These calculations happen transparently, enriching the data stream without requiring explicit configuration.

Edge Analytics

Beyond raw collection, the Data Scraper runs a set of analytics directly at the plant, computed from the live metric feed and exported as chartable time series alongside the raw data.

Expected Power and Performance Ratio

The agent computes the expected power for each plant from its irradiance sources (on-site pyranometer, satellite radiation, or weather model) and compares it against actual output to derive the performance ratio (PR) — a normalized measure of how well the plant is converting available sunlight into electricity. PR is calculated per day and accounts for clipping, frost, and data-quality filters so that anomalous days do not distort the result.

Live Curtailment Tracking

When a plant produces less than it could, the agent attributes the foregone production to its cause: curtailment by the marketer (a deliberate market-driven cap) versus curtailment by the grid operator. This distinction matters for loss accounting and contractual reporting. Curtailment is tracked per minute and emitted as both instantaneous power and cumulative energy. See Loss Detection for how curtailment fits into overall loss attribution.

Clear-Sky Baseline and Forecasting

  • Clear-sky baseline: a theoretical ideal-conditions PV curve maintained as a long-term record, giving you a stable reference to compare real output against.
  • Day-ahead forecast: a short-horizon PV production forecast derived from weather data, so you can anticipate the next day's output. These are weather-physics based, not statistical guesses.

Historical Backfill

When a plant is first connected or after a gap, the agent can backfill historical data — replaying raw readings and re-deriving the analytics above for a requested window, then handing the result to the live pipelines so charts are complete from day one rather than starting empty.

Network Monitoring

The Data Scraper includes a built-in local-network inspector that runs alongside data collection and maps the plant's on-site network. It discovers devices on the configured network ranges by sweeping for reachable hosts, reading the address table, identifying vendors from hardware addresses, and probing devices for their identity. Discovered devices are classified against a large library of known network-equipment profiles, and an AI device-identification step helps recognize device families that simple rules miss.

Once devices are known, the inspector polls them for reachability and health (response time, interface and resource status) and reports the results to the platform. You can trigger or stop a discovery scan, re-check an individual device, and review the discovered network — see Local Network Inspector.

Proxy Auditing

For plants reachable through the Mirox Browser Proxy, the agent audits human access to local device interfaces. It groups each person's activity into sessions, redacts sensitive query data before anything is stored, and can produce an AI-generated summary of what a session did. This feeds the platform's access audit trail — see Access Audit Log.

Performance Characteristics

Typical Performance:

  • Polling frequencies: 1-300 seconds per adapter (configurable)
  • Concurrent adapters: 20+ running simultaneously
  • Throughput: 10,000+ metrics per minute sustained
  • Latency: Sub-100ms from collection to database insertion
  • Resource usage: 5-15% CPU, 100-500 MB memory on edge hardware

The asynchronous architecture ensures high concurrency without blocking, enabling efficient collection from many sources simultaneously.

Related Features

  • Digital Twin — the physics-based analysis engine that consumes the metrics the Data Scraper collects
  • Mirox-Agent Overview — how the Data Scraper fits within the wider edge agent
  • Deployment Options — on-site versus cloud deployment trade-offs for the agent
  • Metric Collection — the full standardized metric taxonomy
  • Local Network Inspector — the on-site network monitoring surface
  • Loss Detection — how curtailment and other losses are attributed
Prev
Agent Deployment Options
Next
Digital Twin
MIT Licensed | Copyright 2026 Mirox Verwaltungs GmbH