1. Introduction
In the last few years, the Internet of Things (IoT) has been gaining significant traction and proliferation in domestic environments. Marketed as “smart home” systems, IoT platforms aim to automate the operation of common household appliances and to make controlling them more convenient. IoT products targeted at the consumer market are typically connected via a remote cloud-hosted server, commonly run by the device manufacturer. These online services offer limited interfaces for the end user to control their devices, often no more than a manufacturer-provided proprietary smartphone app, sometimes accompanied by voice control services. While this implementation satisfies the use cases of some customers, at least initially, many users may wish to control their devices fully locally within their own home network without the need to connect to an external server hosted on the public internet. However, most commercially available IoT devices do not offer such an application programming interface (API) to facilitate local control, instead running closed-source device firmware that makes control possible solely via the aforementioned manufacturer cloud service and associated apps.
There are a multitude of reasons why a user might prefer to control their devices locally, of which some of the most prominent are outlined below:
First, all security considerations are constrained to the network to which the device is connected and its immediate physical vicinity. While refraining from connecting to any external server alone does not adequately protect a device, it significantly reduces the number of possible attack vectors. Even if a manufacturer takes care to implement state-of-the-art security measures, regular updates might not be provided down the line, or may not be installed by users. Given that the device firmware, control software used on cloud servers, and end-user apps are often all proprietary, it becomes impossible to independently ascertain whether or not the application contains critically exploitable bugs or even deliberate backdoors. In contrast, open-source software allows the user to verify the codebase, as well as to modify it if desired. This allows for independent review, meaning that it is no longer necessary to place unconditional trust in the manufacturer in terms of either their benevolence or their ability to produce safe and bug-free code when allowing their device to connect to one’s internal network. Even in the suboptimal case of the device firmware being closed source, the existence of a feasible local control mechanism makes it possible to block the device from connecting to the internet, which significantly limits the possible attack surface available to exploit the device or even gain unauthorized access to the internal network itself.
Privacy is another common concern. Remotely controlled IoT devices usually require registering an account with the manufacturer, necessitating the disclosure of personally identifiable data (typically at least an e-mail address). This puts the user at risk in the event of the IoT control server being compromised. In particular, the user’s e-mail address might be exposed to third parties, which could lead to spam and phishing attacks. Ensuring an appropriate level of privacy protection is particularly essential with devices that are capable of recording and monitoring highly personal living spaces in a home, such as security cameras and voice assistants.
Furthermore, if storage in the server’s database is not implemented according to best practices, the user’s credentials—most commonly a password—might be trivial to obtain, a particular concern when users re-use the same password on multiple services.
Another reason to prefer local control is
latency. Sending a network packet within one’s local network is typically accomplished within less than 50 ms, which is perceived as instantaneous to the user. Contrast this with a remote manufacturer server that may be in another country, where sending the same packet may take about half a second. The user-perceived responsiveness of the device is severely impacted, as the minimum perceivable latency is only 200 ms at most (cmp. [
1] p. 32).
Availability is another major concern. If the IoT control server is unreachable or the user’s internet connection is disrupted, control of the devices will not be possible.
Another essential factor is
longevity. After an IoT product has reached the end of its commercial availability, the manufacturer might opt to disable servers or support for that device. This could also occur due to manufacturer bankruptcy. If the remote IoT control server ceases to function, the device is not controllable; depending on its design, it may be severely limited in its functionality or even rendered entirely inoperable. One example consists of the home security cameras and systems provided by Nest Secure, which have been disabled since April 2024 [
2]. Such shutdowns are not only costly for the users, who have to replace the device, but are also unsustainable for the environment, as devices that are in physically perfect working order must be disposed of merely because the means of controlling it is no longer functional.
Flexibility is another advantage. Proprietary solutions often require the user to install a mobile app for each vendor, which consumes limited mobile device resources and leads to a disjunct system composed of a multitude of different apps and accounts on the user’s mobile phone to control many devices from different vendors. With a sufficiently open local API, advanced automation via software such as Home Assistant or other custom integrations becomes a straightforward possibility. Such options are typically severely limited with proprietary servers. This often results in vendor lock-in, where a user is unable to automate a process despite owning hardware that has the technical capability to do so. As a simple example, consider a motion detector and a light. If only one of these two devices uses a proprietary app and there is no way to interface with it over a common API or compatibility adapter, even a simple automation use case such as turning on the light if motion is detected cannot be implemented using these components. This example highlights the importance of open—and preferably local—APIs, as they are a base prerequisite for effective home automation. Without such basic provisions for interoperability, deeming IoT devices “smart” may be considered an overstatement.
Remote access is one of the key reasons manufacturers often prefer remote server control over local control. It is a selling point to be able to, e.g., turn on the heat at home before one leaves work, or check that all the lights are off after leaving the house. However, remote and local control APIs are not mutually exclusive, so a decision to not offer a local API might be due to the manufacturer desiring a higher degree of control or reduced implementation complexity by foregoing the need for, e.g., an embedded web server on the device. Philips Hue is a positive example of a large-scale IoT ecosystem that offers both remote and local APIs.
Finally, ownership of hardware should empower users to use the device as they see fit. The possibility of local control is essential to ensuring non-revocable freedom of use.
Methods for Proliferating Local Control
In 2024, the Open Home Foundation was created to facilitate the goals of
privacy,
choice, and
sustainability (cmp. [
3]) with the vision of enabling a truly open and local smart home system. The WLED project, which will be used for the main example implementations in this work, is a member of the Open Home Foundation.
Open-source software is a key element in making IoT devices more accessible, including local control; when a device runs open source firmware, users are free to customize its software behavior to their specific needs, which can include adding APIs to enable local control.
Consumer behavior is essential for long-term improvement of implemented IoT solutions. If customers are sensitized to the issue and prefer to purchase devices that allow local control or are fully open-source, this is likely to influence manufacturer behavior, leading them to offer more flexible means of using their devices.
Web technology has evolved rapidly in the last few years, particularly the JavaScript APIs present in modern browsers, to the point where it is often feasible to replace proprietary native apps with open web-based apps that offer cross-platform compatibility with almost all internet-connected end user devices that have a standard web browser installed. This could be expanded further in the future to offer built-in non-TLS means for browsers to communicate securely with low resource locally connected IoT devices.
The goal of this research is to identify and analyze methods for offering a secure local means of control for constrained IoT devices, with particular focus on browser-based web apps, i.e., those that do not require the installation of custom software on clients’ control devices. As an example implementation, the ESP8266 and ESP32 microcontrollers by Espressif are considered due to their widespread use in both homemade and commercial IoT devices. For demonstration, the open-source lighting control project WLED is used; see
Section 2.9.
In the following, the technical background is first provided in
Section 2; then, the novel approach for secure and lightweight browser-based device control is outlined in
Section 3.
2. Technical Background
Herein, fundamental concepts used for the subsequent implementation are outlined and the necessary prerequisites are established. These include both the security objectives that the solution aims to achieve and possible approaches to implementing cryptographic protection measures in a web browser context.
2.1. System and Threat Model
It is useful to first establish an overview of the local IoT control system and the security objectives guaranteed by the implemented solution.
By definition, a direct user-to-device IoT control system involves two core participants: the user and the IoT device. As the user utilizes a standard web browser, they assume the client role, while the device acts as an HTTP server. We define a third participant, a trusted server used to download JavaScript code to be executed securely in the user’s browser. The reason this server is required for our approach is outlined in more detail in
Section 2.3. There are two dedicated external communication channels: first, an insecure HTTP and WebSocket connection between user and IoT device, and second, a trusted TLS connection between the user and the trusted external server. Additionally, there is a third notable communication channel internal to the user’s browser enabling message exchange between the two browser windows, one with the page served by the trusted server and the second served insecurely by the IoT device server (see
Section 2.5). This channel is also regarded as insecure, since it can be viewed as an extension to the external HTTP channel with the IoT device and as such is indirectly subject to similar security constraints. The user and IoT device are in control of a pre-shared key (PSK) that is established through a secure out-of-band method, e.g., a hardware serial bus. Lifecycle management of the PSK is considered out-of-scope.
We assume an active network attacker in the channel between the user and the IoT device. The attacker has full knowledge of the system design along with the ability to intercept, block, read, modify, replay, and send arbitrary messages, i.e., it aligns with the adversary specified in the Dolev–Yao model [
4] (cmp. [
5], Section 1.3). As such, it is able to mount man-in-the-middle (MitM) attacks. By extension, the attacker has the same capabilities for the internal browser inter-window message channel. The attacker is unable to break cryptographic primitives and has no knowledge of the user’s password or the derived PSK; thus, it is unable to forge valid message authentication codes for a chosen plaintext. Furthermore, the attacker is assumed to be incapable of interfering with the TLS connection between the user and trusted external server in any way. While a denial-of-service attack would be possible, it is regarded as out-of-scope in this threat model. On the user side, the attacker has no direct control over code downloaded to or executed in the user browser, with the crucial exception of JavaScript downloaded directly from the IoT device. In particular, the attacker is assumed to be incapable of modifying the user’s browser, device hardware, or software, adding certificates to the user’s truststore, exploiting hardware access to the user device, or using social engineering to influence the user’s behavior. Similarly, on the device side, the attacker does not have hardware access to the IoT device; the attack surface is limited to the HTTP control network. The initial PSK agreement between the user and device occurs securely out-of-band.
Regarding the security goals (cmp. [
6], pp. 303–304) for low-risk smart devices such as lighting equipment that do not handle personal data, keeping control commands secret may not be required; thus, confidentiality protection is not unconditionally required. However, the proposed approach must be able to protect against illegitimate or modified control commands being accepted by the IoT device; therefore, integrity and authenticity protection is regarded as essential. Furthermore, mitigation against replay attacks is crucial to preventing an attacker from re-sending previously captured legitimate control messages at a later time. Availability is also important, though there are only limited measures that can be taken against attacks that intercept and discard messages as well as against denial-of-service (DoS) attacks. While some mitigations are compatible with our approach, such as VPN tunnels, redundant channels, and firewalls, they complement it and are not considered in detail in this approach. Non-repudiation is also not critical in a typical end-user IoT environment; omitting this requirement enables use of lightweight and purely symmetric mechanisms based on a pre-shared key. In conclusion, for this IoT application scenario, only authenticity and integrity with replay attack mitigation are considered as fundamental security goals.
A visual overview of the system model is provided in
Figure 1. Note that the functionality “Control UI” is listed in both tabs; in practice, user interface elements could be defined in either tab. In the PoC, the entire UI is part of the insecure tab hosted by the device. In order to fully mitigate injection of attacker-chosen control commands through the UI, the UI code must be hosted by the trusted external server and form part of the secure crypto tab (see
Section 5.3).
2.2. Web-Based Apps as an Alternative to Native Apps and Browser-Based Cryptography
In order to successfully design a secure local IoT system controlled exclusively via a web app running in a standard browser, it is first necessary to gather an overview of the cryptographic functionalities present in modern browsers and their associated limitations.
The primary means of control for most commercially available IoT devices for home use are native smartphone applications. While major mobile operating system software development kits (SDKs) offer a high degree of flexibility for the application developer in terms of the interfaces and device features they can utilize, they can have drawbacks for the end user. Because the source code is typically not accessible, users have to trust manufacturers and distributors not to abuse the power granted to them. Both modern operating systems and browsers implement a permission system to let the user decide which device functionality and data the app or website is allowed to access. A key problem with this system is that it is often not clear to the user why and in which scope particular permissions are required for the app to function; the choice given to them is often not granular enough. A specific example pertinent to IoT control apps is location access. Many WiFi-based devices open an access point (AP) for initial setup. The app then scans for and connects to this AP to provision the device with the credentials for the user’s home WiFi network. In recent versions of Android, the location permission is required to connect to WiFi networks within the app, as network SSIDs could theoretically be referenced against publicly available SSID maps such as WiGLE; thus, a likely location of the user is inferred. Therefore, granting location permission is often required for initial device setup. However, this allows the app to store the location of the user during setup or even to continually track them, which constitutes data collection that for most categories of smart home devices (such as lights, air conditioning, and cameras) is not strictly necessary. As such, this not only fails to adequately protect user privacy but also risks non-compliance with laws enacted to protect individual privacy, such as the General Data Protection Regulation (GDPR) in the EU.
If control of an IoT device from user end devices is desired without requiring additional apps, web browsers can be a viable choice. Virtually all computers and smartphones have modern browsers pre-installed, as do some specialized devices such as e-book readers. CSS offers a rich toolkit to create visually appealing user interfaces and JavaScript offers the majority of features needed to implement client-side logic, to the point that many natively installable apps are merely wrappers around a web page. This approach is used by the WLED project as well as by other widely used applications such as the chat platform Discord. An even more lightweight alternative consists of progressive web apps (PWA), which appear as native apps to the user, for instance by having a dedicated app icon, running full-screen without visible browser UI, and being able to run offline using caching [
7]. Additionally, the client-side code of a website or PWA is in most cases open-source by definition, as JavaScript is an interpreted language that is executed directly from the corresponding source code. While code obfuscation and minification (that is, removing formatting and comments and shortening variable and function names) are possible and often advisable to reduce the size of the page when transferred over the network, such minified JavaScript can still be analyzed and customized more readily than an assembled and packaged binary application. Another feature built into modern browsers that is highly useful for secure local IoT control is the Web Crypto subtle API, which implements select cryptographic primitives for in-browser use. However, one first has to carefully consider whether cryptography code downloaded from the device’s embedded web server and executed in the browser can be trusted; this is particularly the case with devices that do not support transport encryption.
2.3. Secure JavaScript Cryptography Code from Untrusted Origins
The theory of the so-called “Browser Cryptography Chicken and Egg Problem” [
8] states that “if you can’t trust the server [or message transport] with your secrets, then how can you trust the server [or message transport] to serve secure crypto code?” [
9]. Essentially, in a server and client architecture without TLS, the web page is sent to the client unencrypted. In that case, there is no easy way to implement client-side cryptographic mechanisms without TLS communication, as all page JavaScript comes from the server and could be subject to a MitM attack. Therefore, it is inherently untrustworthy, and any trust in JavaScript code run on the client side would first need to be established using either built-in browser functionality or external HTTPS servers.
Using external servers for downloading or verification of certain client-side code is less than ideal, as some of the availability and privacy concerns involved in full remote device control still apply. However, it is deemed preferable to remote control, as no user accounts are required and caching—as is possible when deployed as a PWA—may allow the solution to continue working in temporary offline conditions. Moreover, users could host such a server themselves on more performant hardware.
2.4. Secure Contexts in Browsers
An important consideration in browser-based cryptography involves secure contexts. A multitude of powerful browser APIs are only available in secure contexts, i.e., when the page is loaded via TLS. This is meant to protect personal data, as access to potentially sensitive APIs such as the camera, location, and microphone is denied. One notable API that is also only available in a secure context is the aforementioned Web Crypto subtle API, which provides cryptographic primitives and useful functions such as password-based key derivation (PBKDF2) and implementations of various hash functions. The Subtle Crypto API is only available in secure contexts, likely due to the fact that, as outlined above, any client-side JavaScript transferred over HTTP is inherently insecure (at least without additional integrity checking of the downloaded JavaScript via trusted in-browser code). The Sub-resource Integrity feature of modern browsers supports checking the integrity of additionally downloaded JavaScript files against a hash provided in the integrity attribute of the <script> tag to be loaded. However, to the best of the authors’ knowledge, such a built-in integrity check mechanism is not natively available for the root HTML page itself; therefore, a trusted root page served over TLS is still required.
Furthermore, mixed content prohibition complicates interconnection between TLS and non-TLS servers. Unsecured sites may fetch additional content from TLS servers, but the inverse is not true; browsers will in general block all unsecured HTTP requests from pages loaded via HTTPS. Therefore, in the context of a non-TLS-capable locally controlled IoT device, approaches such as loading the control interface HTML, stylesheets, and JavaScript from a TLS-enabled external server and only establishing an HTTP connection with the device for some API control commands are not trivially possible.
2.5. Inter-Window Messaging
Even though a direct HTTP or unencrypted WebSocket connection to the device from a secure context is disallowed by modern browsers, indirect communication is possible under certain conditions. Two different browser windows that have a mutual reference can communicate with each other using the window.postMessage() API to send a message and the window.messageEvent API to listen for incoming messages.
Note that window here refers to a independent page context; it does not need to be a different browser window as seen by a user, but can also be, e.g., a second tab. The reference to the other window can be made bym e.g., loading the second window in an <iframe> element embedded in the first window (though unencrypted iframe content is also disallowed on secure pages) as well as by opening the second window in a new tab programmatically, which makes a reference to the original window available via window.opener, thereby allowing the opened window to send an initial message. The window message API is even available if the two windows have different origins (which often means they are hosted on different servers); crucially for the JavaScript cryptography use case, one window may be a secure context while the other is not. This allows developers a controlled circumvention of enforced secure context constraints, as they can pass a message to the secure context window, let it process the message in a trusted environment using an API such as Web Crypto that is only available in a secure context, then return the result to the untrusted window via messaging.
2.6. TLS-Free Client Authentication Mechanisms
For a server to be able to discern authorized from unauthorized users, i.e., clients, an authorized user is required to present proof of their authorization to the server. Using TLS, this proof can be a client certificate; however, this is not an option without TLS. Most commonly, passwords are used for authentication, though they require both transport and storage protection measures against eavesdropping and subsequent misuse.
2.7. Deriving a Shared Key from the User Password
Most servers that employ transport layer security (TLS) rely on it to transmit the cleartext password, then carry out hashing operations within the server itself. This raises the question of why password hashing is not more commonly carried out on the client side, which would ensure that the cleartext password is at no time exposed to the server. One likely explanation may trace back to the “Browser Cryptography Chicken and Egg Problem” in the sense that if the server cannot be trusted to handle the password, it also cannot be trusted to serve secure cryptographic code for client-side password hashing. Another reason may be to reduce complexity or reliance on JavaScript on the client side, or to increase performance when combining a slow client device and an expensive password hashing algorithm.
In an IoT device context where TLS is not available, client-side password hashing becomes indispensable in order to avoid exposing the cleartext password during unencrypted transmission between the browser and IoT device.
An option for password hashing that is built into browsers is HTTP digest access authentication. This allows for client-side password hashing and subsequently avoids sending the password in the clear without any use of JavaScript; unfortunately, however, it is susceptible to MitM attacks ([
10], Section 5.8). Because the connection is unencrypted, the attacker can downgrade the authentication scheme to use HTTP basic access authentication, which transmits the password in cleartext with simple Base64 encoding. This manipulation is not readily detectable to the end user, since the HTTP authentication dialog is rendered identically regardless of whether basic or digest authentication is in use. Furthermore, the appearance is not customizable by the page but is always rendered as a generic popup login dialog, as shown in
Figure 2.
On a page sent via unencrypted HTTP, the JavaScript Web Crypto API is not available, as the window is not in a secure context; therefore, the built-in password-based key derivation (PBKDF2) cannot be used directly, nor can th user implement their own algorithm directly in JavaScript or download it from an external server in a secure manner due to the “Browser Cryptography Chicken and Egg Problem”.
2.8. Non-TLS Data Transport Security Mechanisms
Full TLS is not unconditionally required to implement message transport that is protected in terms of confidentiality, integrity, and authenticity. In particular, the asymmetric key exchange steps and required buffers drive up implementation complexity and memory requirements. In cases where use of a pre-shared key (PSK) is acceptable, asymmetric cryptography is no longer required; in principle, symmetric ciphers can be used directly. Symmetric cipher suites that may be suitable in the context of IoT devices and JavaScript client implementations include the Advanced Encryption Standard (AES) and ChaCha20, which “is considerably faster than AES in software-only implementations, making it around three times as fast on platforms that lack specialized AES hardware” ([
11], p. 3). This property that is very useful on constrained devices with low processing power and those that lack hardware support for AES. Ascon is a noteworthy family of symmetric ciphers designed to be particularly lightweight and that are standardized by the US National Institute of Standards and Technology (NIST) as “Lightweight Cryptography Standards for Constrained Devices” [
12].
In applications where confidentiality is not required, data encryption can be omitted entirely, since a cryptographically secure message authentication code (MAC) is sufficient for integrity and authenticity protection. This increases performance compared to data encryption, as calculating a cryptographic digest (hash) of the data and authenticating it is typically significantly faster than encrypting the entire message.
A widely used MAC is HMAC. The integrity of a message can be asserted by the sender calculating and including the resulting hash in the transmission. The receiver, which also possesses the PSK, may now repeat this calculation; if the calculated hash matches the one included by the sender, it is valid and the receiver can be assured that the message has not been tampered with and originated from someone who knows the PSK.
2.9. Introduction to WLED
WLED (backronym: Wireless Lighting Effects Driver) is an open-source software project conceived and maintained by the author C. Schwinne since 2016, designed for the ESP8266 and ESP32 microcontroller series by Espressif Systems Co., Ltd., based in Shanghai, CN. It is engineered to drive a large quantity of digital individually-addressable full-color LEDs; each LED may be set to a different color and brightness via a serial protocol, enabling an abundance of different lighting effects. WLED has over a hundred built-in effect modes, from simple blinking, flickering candles, and twinkling fairy lights up to complex visualizations intended to be displayed on a two-dimensional matrix of LEDs. WLED offers an easy-to-use and feature-complete web-based control user interface; additionally, native mobile applications and smart home automation system integrations are available. In this work, WLED version 0.15.0-b5 is considered.
The primary interface for controlling a WLED instance from external hosts such as home automation software as well as from the included web-based user interface is the WLED JSON API, for which detailed public documentation is available (cmp. [
13]). Within the scope of this work, it is important to note that the JSON API is available both via an HTTP endpoint (
/json) and via a WebSocket connection. WebSocket is a protocol for bidirectional messaging communication between an HTTP server and client based on a persistent TCP connection; thus, it can be used to replace resource-heavy and high-latency HTTP polling for receiving status updates from the server. The syntax of a WLED API command is a simple JSON object containing the keys to be set. For instance, the command that turns on the light and sets it to approximately half brightness (
bri is an 8-bit value denoting the current overall brightness, which has a range of 0–255):
{
"on": true, "bri": 128
}
3. New Approach Methodology
This section describes how the components outlined in
Section 2 are combined to facilitate secure and lightweight control of devices via standard web browsers.
3.1. Hosting of Browser-Based Secure Cryptography
Due to the “Browser Cryptography Chicken and Egg Problem” outlined in
Section 2.3, performing most cryptographic operations or even just accepting input of a user password is fundamentally insecure on pages that have been loaded via HTTP only. In order to avoid this problem, facilitate secure message authentication, and ensure that the user password is not transmitted over an insecure connection, a trusted execution environment is required. For instance, this could be a page locally stored on the user’s device, although loading pages from downloaded HTML/JS source files is typically not implemented in browser apps for mobile devices and is also inconvenient for users. Alternatively, it could be hosted via HTTPS on a domain the user trusts (e.g.,
https://rc.wled.me (accessed on 24 December 2025), or the user could host their own server running this page for cryptographic functionality (denoted as the “secure crypto tab” (SCT) in the following).
A key goal of the system to be implemented is handling all cryptographic operations within the browser client, which has two advantages: first, apart from initial WLED ESP provisioning, where the PBKDF2-derived pre-shared key needs to be stored on the ESP for HMAC verification, no secrets have to leave the user device; second, a static web page is sufficient for hosting, which greatly simplifies deployment and allows hosting on, e.g., GitHub Pages or an out-of-the-box Apache server without advanced configuration, as would be necessary for a full server-side framework such as Django.
In order for the secure crypto tab to be able to connect to the WLED instance despite the browser’s mixed content prohibition, inter-window messaging is used as explained in
Section 2.5. The user is unable to control the light by directly visiting the address of the WLED instance, as depicted in
Figure 3. Instead, as the first step, they need to open the SCT, and can subsequently open the instance user interface through it by first entering their password and clicking the “Connect” button, as shown in
Figure 4. The abbreviated code for the initial window messaging handshake is provided below.
function onLoad() {
//...
if (window.opener) {
window.opener.postMessage(’{"wled-ui":"onload"}’, ’*’);
}
//...
}
export function handleMessageEvent(event : MessageEvent) {
//... origin verification, error handling
var json = JSON.parse(event.data)
if (json[’wled-ui’] === ’onload’) {
event.source!.postMessage("wled-rc":"ready"}’,
{’targetOrigin’:event.origin}
);
}
// handling other message types
}
function handleWindowMessageEvent(event) {
//... origin verification, JSON parsing
if (json[’wled-rc’] === ’ready’) {
useSRA = true;
sraWindow = event.source;
sraOrigin = event.origin;
}
}
In the first UI code block, note the second parameter ‘*’ in the call to postMessage. This sends the message to any opener window regardless of its origin, which is necessary because browsers do not allow access to window.opener.origin for cross-origin privacy reasons. This is not ideal, as with no initial way of verifying the opener origin, a malicious site could hypothetically link to the control UI and eavesdrop on the control commands as they are sent. However, because the HTTP-based transport does not offer confidentiality protection in the first place, this is a largely theoretical concern and could also be prevented by only allowing a trusted predetermined origin here, at the cost of preventing users from hosting the secure crypto tab themselves on a different origin. After the reply to the handshake message by the SCT is received, event.origin is accessible and can be used for custom filtering rules.
3.2. Cross-Origin Resources and Security Header Profile
In order for window messaging to be possible, at least one of the two involved windows needs to retain a reference to the other window. This is either the value returned from
window.open() in the SCT or
window.opener in the UI tab. As the UI tab and SCT are different origins, the Same-Origin policy already restricts the possible operations on the window reference. Not only is there no access to
window.opener.origin as outlined above, direct access to the DOM (Document Object Model) or JS is also inhibited, ensuring that the UI tab and a potential MitM attacker can only post messages to the secure tab but not modify it or access sensitive JS variables in any other way. Still, it would be desirable to make this relation unidirectional, meaning that only the SCT could send the UI tab messages by default. However, this is not supported, since the measures to inhibit (cross-origin) opener references break the relation in both ways. This is achieved by, e.g., setting the
noopener feature on
window.open() or setting the
Cross-Origin-Opener-Policy (COOP) header on the SCT to
same-origin, which causes the opened UI window to be opened in a new Browsing Context Group (BCG), which has no references to the opener. Therefore, to enable
postMessage, the SCT must necessarily open the UI tab in the same BCG. Still, the COOP header on the SCT can be set to
same-origin-allow-popups, which ensures that the SCT can open the UI window in the same BCG using
window.open(), as the UI window is using a COOP of
unsafe-none. At the same time, this safeguards the SCT itself from being opened by other potentially malicious origins in the same BCG [
14]. For additional defense-in-depth, it is advisable to set additional security headers to take advantage of modern browser features in order to limit the attack surface for cross-site scripting (XSS) and other types of cross-origin attacks such as UI redress. It is important to note that headers on the UI page can be modified by an MitM adversary; therefore, their value cannot be relied on and should be regarded only as a supplemental measure. Headers sent by the server for the SCT are protected against manipulation through TLS, but cannot always be set at will when using static hosters such as GitHub Pages. For the PoC, the Security headers in
Table 1 are used.
3.3. Technology Stack
The ESP-side code, which expands the functionality of WLED by HMAC verification capabilities, is written in C++ (C++17). The cryptographic primitive implementations for SHA256 and HMAC are provided by the ESP8266 Crypto library, although the code is expected to be fully portable except for the ESP-specific random number generator implementation. It also provides an AES encryption implementation that may be useful in case the implementation is to be modified to offer confidentiality protection. It is contained in a single source file, Crypto.cpp, that is below 1000 lines of code, making it easy to independently analyze.
The WLED web UI functionality is implemented in a single JavaScript file, index.js.
For the secure crypto tab, which implements all in-browser cryptographic operations, a TypeScript project is used that relies on modules to improve code structure. JavaScript modules allow for more efficient implementation of object-oriented programming concepts by, for instance, requiring all public functions of a module to be explicitly exported for use in other modules. Other functions are private by default. TypeScript is a superset of JavaScript that adds static typing, and as such enables type checking. The Vite build tool is used to transpile the modules back to JavaScript, which can then be executed in standard browsers and merged into a single file again. The user-facing appearance of the SCT is shown in
Figure 4.
3.4. PBKDF2-Based PSK Derived from User Password
In order to avoid potential unencrypted transmission of cleartext passwords between the browser and ESP, only a hash of the password is ever transmitted outside of the SCT. The PBKDF2 algorithm is chosen for the PoC, as it is the only algorithm available in the Web Crypto API that is suitable for deriving keys from low-entropy passwords (cmp. [
16]). Newer hashing algorithms designed to be used for passwords, for example Argon2id, would be preferable, as they are more resistant to attacks based on Graphics Processing Unit (GPUs) or Application-Specific Integrated Circuits (ASICs) due to requiring use of a large amount of system memory (cmp. [
17], Section 4). As the KDF only needs to run on the user browser side, there is no requirement to implement the KDF on the IoT device itself, since it can be provisioned directly with the derived key. Therefore, substitution of PBKDF2 with an algorithm such as Argon2id that is hardened against GPU and ASIC attackers is possible by employing a suitable and well-vetted implementation that can run within the browser, preferably using WebAssembly (WASM) for increased performance (cmp. [
18] for an existing candidate implementation). For the PoC implementation of PBKDF2 with the SHA256 hash function, a work factor of 1,000,000 iterations is initially chosen. This offers some buffer over the 2023 OWASP recommendation [
19] of 600,000 iterations. The Web Crypto implementation is tested for performance in
Section 4.3, since the key derivation should be intentionally slow to mitigate attacks but fast enough to not be a hindrance to the end user for a single derivation. Ideally, the key derivation time should stay within a period of time that the user perceives as quick. The Doherty threshold [
20] of 400 ms could be considered a reasonable benchmark [
21].
3.5. Generation of the MAC
The Web Crypto API is also used for generation of the HMAC message authentication code. The TypeScript function utilized for HMAC generation is provided below; note that some non-critical lines are omitted for brevity and readability:
async function generateHMAC
(message: string, key: CryptoKey) : Promise<string>
{
const messageBuffer = new TextEncoder().encode(message);
const sig = await crypto.subtle.sign(’HMAC’, key, messageBuffer);
return Array.from(new Uint8Array(sig)).map(function(byte) {
return (’0’ + (byte & 0xFF).toString(16)).slice(-2);
}).join(’’);
}
This function is a good example of how the Web Crypto API can be used. Calling the primitives is usually accomplished by a single line, though some setup code is needed, particularly to set parameters correctly and to convert formats, as the Web Crypto API only operates on byte buffers (Uint8Array in TypeScript). The last three lines of code in the function merely convert the returned HMAC to a hexadecimal string for sending it to the ESP over the network.
3.6. Transport of the HMAC
After authenticating a JSON message, the HMAC needs to be sent along with the message in order for the receiver to be able to verify that the sender is in possession of the pre-shared key by calculating the HMAC of the message itself and comparing it to the one sent along with the message. There are multiple ways to implement such data transfer with an associated authentication code; the most relavant for the application in WLED are JSON payloads either over HTTP or WebSocket. In a pure HTTP environment where JSON API commands are transmitted via HTTP POST request, a straightforward implementation method would include the HMAC as an HTTP header separate from the JSON payload sent in the POST request. However, this approach is not possible when using a WebSocket connection, as this is just a persistent TCP connection allowing for bidirectional messaging. Thus, the HMAC needs to be integrated into the message itself. For this purpose, the WLED JSON API calls are wrapped; for this, the following JSON syntax is chosen:
{
"mac": "baddecafc0ffee[...]",
"msg": {
"on": true,
"n": {"sid":"5e55101d[...]","c":41}
}
}
Here, the msg object ({“on”:true}) is the original API command, in this case to turn on the lights. An additional key n is added; this is a session ID and counter-based nonce used to prevent replay attacks. This is wrapped in another JSON object that also contains a string value mac containing the HMAC of the API command encoded as a hexadecimal string.
3.7. Verification of the HMAC
The verification of the HMAC itself, without nonce validation for replay attack mitigation, is simply implemented by re-calculating the HMAC of the message data and comparing it to the MAC sent along with the message.
3.8. Nonce Implementation
As mentioned above in
Section 3.6, the nonce consists of a session ID and a counter that is incremented for every message sent. For the session ID, a random value with length 128 bit (16 bytes), is chosen to balance the collision risk with the associated memory use and transmission length. The session ID is sent as a hexadecimal-encoded string; a single byte is represented by a set of two characters (e.g.,
FF for decimal 255), meaning that the entire session ID is represented as a 32-character long hex string. A suitable source of entropy is required for secure random number generation. The ESP series of boards has a special register for obtaining hardware-seeded random numbers ([
22], Section 25). The counter is incremented by the client SCT on every message sent. The connection between the control UI tab and the WLED instance is always established via either WebSocket or HTTP. As both protocols utilize TCP for packet transport, which guarantees in-order delivery of packets, keeping track of a sliding window of allowed counter values is not required. It is sufficient for the ESP to reject any value less than or equal to the last received counter value for that session ID. A 32-bit length is chosen for the counter, as this theoretically allows for over four billion messages to be authenticated before a new session ID must be used to prevent the unsigned counter from overflowing and wrapping back to 0. The ESP keeps track of allowable nonces by storing a configurable number of session ID/counter pairs. Each pair uses 20 bytes of memory (16 bytes for the session ID and 4 for the counter); the array is sorted by which session IDs were most recently used in order to enable always replacing the oldest session ID with a new one if required and the array is already full. Generally, it is advisable to replace the oldest session ID; however, if multiple session IDs are requested without ever being successfully used in a MAC-authenticated message, it may be preferable to replace them first. This could mitigate a potential “session ID exhaustion” denial of service attack, where new sessions are continually started, invalidating session IDs still in use by legitimate clients. It is crucial that the nonce is included
within the HMAC-authenticated message; if it were sent alongside the message, an attacker could tamper with the nonce without invalidating the MAC.
3.9. Communication Flow
The implemented system consists of three codebase components running independently: the secure crypto tab, the user interface tab, and the WLED firmware running on the IoT device.
There are two core interactions between the components: the initial login authentication and subsequent authentication of control commands. Both core interactions are visualized below;
Figure 5 shows the initial login process, while
Figure 6 outlines the control commands authentication process. The “pipe” symbol in the UI tab denotes relay functionality; the UI tab does not process the message itself but just forwards it from the SCT to the ESP or vice versa, changing the message transport protocol from window messaging to WebSocket or back from WebSocket to window messaging, respectively.
3.10. Design Choices Based on Threat Model
Table 2 justifies the design choices that have an impact on system security against an explicit threat that they aim to mitigate with regard to the threat model defined in
Section 2.1. Concrete and testable security objectives are also provided.
5. Discussion
In the following, several aspects of the implementation are reflected upon and analyzed in terms of their advised usability in production systems and their long-term safety against quantum computing algorithms.
5.1. Secure Production Deployment of the Implemented Solution
While the architecture drafted and implemented in the above was continuously scrutinized in a best-faith effort to be secure, it should be considered strictly a demo implementation for research and testing purposes only. The authors do not guarantee the security of the implemented solution, and it should only be used in practical applications after being thoroughly and independently verified, e.g., through penetration testing and carrying out formal analysis of the protocol message exchanges. Although only known and well-vetted cryptographic primitives are implemented, custom cryptographic system designs can easily be prone to subtle errors that render the entire system ineffective. One example for an aspect that is not immediately obvious but is essential for the security of the design is the nonce-based replay attack mitigation for HMAC-based message authentication schemes. Additional defense-in-depth measures, such as schema validation of incoming messages and stricter headers than used in the PoC, should also be implemented.
5.2. Post-Quantum Cryptography Considerations
There are currently no indications that state-of-the-art symmetric ciphers such as AES-256 or MAC-based approaches such as those implemented in this work would be meaningfully reduced in their security value by quantum-based approaches; therefore, they are deemed quantum-resistant (cmp. [
23], p. 11). As opposed to current TLS implementations, the security of the implemented solution is likely to remain unaffected by the availability of powerful quantum computing resources given the current state of research. This is particularly important with embedded systems such as used in IoT devices, as they are typically in use for significantly longer periods of time than conventional IT hardware.
5.3. Potential for MitM Command Injections
While the implemented solution demonstrates that implementing hash-based security measures is highly feasible on constrained devices with limited resources such as the ESP8266, there are some use cases and limitations not yet addressed by the current implementation.
While the implemented system already significantly raises the transport security value and makes it more difficult for unauthorized parties to send valid unauthorized commands, there is still a significant risk of MitM attacks that allow for MAC verification of unauthorized commands, again with roots in the “Browser Cryptography Chicken and Egg Problem”. Although the implemented system delegates the relevant cryptographic operations to the SCT, all commands to be authenticated still originate from the untrusted control UI that is loaded from the ESP web server via insecure HTTP. This would allow the attacker to insert arbitrary control commands into the UI page source, which could then be automatically authenticated by HMAC generation in the SCT and pass along the ESP-side HMAC verification as regular MAC-authenticated commands.
A possible way to address this problem is by hosting the entire control UI in the SCT; this would ensure that no MitM command injection would be possible. The page loaded from the ESP web server over the insecure HTTP connection would be reduced to acting as a gateway that receives the window message from the SCT and passes it to the ESP via the WebSocket connection. This approach has one drawback, namely, that the SCT must have an available version of the UI that is API-compatible with the WLED version installed on the ESP. While this may be straightforward for standard builds, it would again make it difficult for developers of custom features to have their version of the UI used by the SCT. This problem could be partially alleviated by allowing the ESP to specify a URL where a compatible version of the UI is hosted. While this approach would require retrieving the UI code via that URL and incur the associated limitations regarding availability of online services, this URL could be part of the ESP-side firmware and also be protected against MitM URL replacement attacks using an HMAC generated on the ESP side.
5.4. Comparison to Alternative Approaches
It would be highly useful to effectively compare key characteristics of the proposed approach with other lightweight secure communication protocols for IoT, particularly with regard to resource utilization and asserted security goals. This would allow for selection of an architecture that is most suitable for a given application. Therefore, additional comparative benchmarks should be conducted in future work not only against a baseline build with no security provisions, as in
Section 4.1, but also against a full TLS server implementation, e.g., mbedTLS on an ESP32. We have included an initial indicative latency comparison in
Section 4.5. For the ESP8266, no practically usable TLS server implementation was found to be available, which is to be expected due to its severely limited memory. Furthermore, comparisons with more lightweight IoT protocols based on, e.g., lightweight symmetric protocols such as Ascon [
12], would be highly interesting, even if they are not directly applicable to a direct browser-to-device control scenario.
5.5. A Case for TLS-PSK Browser Support
A standardized alternative to the implemented approach could be TLS-PSK or the functionally similar TLS-SRP (Secure Remote Password). Similar to our approach, these allow for a TLS connection without requiring asymmetric key agreement or an X.509 public key infrastructure by relying on a pre-shared key. This makes TLS-PSK potentially more suitable for private networks and constrained IoT devices than standard TLS, as only symmetric primitives such as AES are required and limited IoT devices are a primary use case [
24]. Unfortunately, standard web browsers do not support TLS-PSK, and even if they did its primary use case would likely be session resumption for connections that are initially established using public key cryptography [
25], with no provision for its use as the exclusive mutual authentication scheme. Were browsers to support TLS-PSK with external PSKs established out-of-band, we would expect the feasibility of a very lightweight TLS server implementation to greatly increase on low-end IoT devices such as the ESP8266.
5.6. Browser Evolution and Possible Direction
Modern web browsers are continually evolving to improve the security of the web, protect user privacy, and mitigate newly discovered vulnerabilities [
26]. In the process, legacy APIs or those deemed insecure can be restricted or removed. Notably, mixed content, i.e., accessing an HTTP resource from an HTTPS page, was allowed by default in all major browsers up until 2013 (Firefox), 2016 (Safari), and 2019 (Chromium) [
27]. It is possible that the new approach presented herein could also be inhibited in future browser versions, for example by always opening mixed-content windows in a new BCG, thereby removing the window reference needed to post messages to opened windows. The authors propose careful exemptions to the mixed-content prohibition with strict guardrails to enable a more seamless user experience without requiring our dual-tab approach. Such guardrails could include, but are not limited to: (1) insecure content cannot be directly embedded into a HTTPS page, instead the insecure response can only be read by JS in escaped text or binary form; (2) insecure requests must be explicitly enabled for each request with a flag called, e.g.,
unsafe-mixed. We believe that this would be a welcome addition that would facilitate interoperability between local and legacy devices on the one hand and the wider secured web on the other.
5.7. Multi-Device and Multi-User Handling
A practical usability concern arises when implementing a system based on the proposed approach for actual daily use in a smart home environment. In a typical household, users are likely to own more than one device of a given category, especially in the case of lighting; in addition, multiple users may wish to have control of the device(s) [
28]. Both of these use cases should be supported in a frictionless manner in a practical IoT application. The implemented PoC does not yet adequately cover these use cases, as the SCT only allows establishing a connection to a single device at a time; thus, it is necessary to open two browser tabs in order for each device to be controlled. Furthermore, only a single password and associated PSK are considered in the PoC. This makes its use in a multi-user environment infeasible without password sharing, which is a fundamentally flawed practice from a security perspective. However, the implemented PoC can be easily augmented to mitigate both of these multi-user and multi-device usability constraints while keeping the same underlying approach to lightweight security. For proper multi-user support, the IoT device could be configured to store a map of multiple user IDs and associated PSKs; users could then enter a username or ID in addition to the password in the SCT, which is sent within the authenticated message JSON and subsequently used by the device to select the corresponding PSK for HMAC verification. Multi-device support is easily enhanced by adding the capability for a single SCT to be shared by and interface with
N devices, thereby reducing the amount of overall required tabs from 2 ×
N to
N + 1. If the control UI is moved to the SCT to mitigate the possible command injection attack outlined in
Section 5.3, the user could dynamically select which device(s) to control through a list or dropdown interface. This approach reduces the tabs the user needs to actively interact with from 2 ×
N (two tabs for each device, its SCT for authentication and the UI tab for control) to just a single secure crypto tab, allowing for a more streamlined user experience.
5.8. Credential Lifecycle Management
Our approach is primarily intended for operational remote control of IoT devices that are part of a shared or public (and consequently untrustworthy) network. It defines no methods for the initial key agreement or subsequent PSK rotations, for instance, the user changing their password. It is assumed that the user occasionally has access to a secure out-of-band channel, for example a physical device bus. Implementing credential management as part of our approach would necessarily require transport confidentiality protection, which would in turn require encryption and make a purely MAC-based approach infeasible. One aspect of lifecycle management that can be trivially accommodated in the current approach is credential revocation: if the user suspects that their PSK has been compromised, they can send a special authenticated control message that invalidates the PSK on the device, ensuring that it is no longer accepted for authenticating additional messages.