IoT Design : An Approach to design resilient, robust, reliable and secure systems
Designing an embedded system suited to IoT requires a different approach than a classic embedded design. It is safe to assume that an 'Embedded IoT' device will be connected, at some point, to the internet. This presents sufficient challenges for connectivity alone. On top of this our small embedded device is expected to keep pace with rapid changes in the IT and InfoSec world, be as easy to update as a PC and as secure as a 24/7 monitored data center. These are big tasks but they are achievable with the correct mindset and system design approach.
First we need to accept that designing an Embedded System for use in an IoT application is different to how many stand-alone, unconnected devices are implemented. We must look at the device from where it fits into the whole system and perform the relevant threat modelling and risk analysis to help determine how the device must perform, what it needs to be capable of in the context of its place in the system. If we are connecting a device, wired or wirelessly, then we need to accept that, at some point in the service lifecycle of that device, which could be 1, 2, 5, 10, 20 years, ‘somebody’ will try to attack the device itself or the protocols and standards it uses. The ramifications of this are that we must place security and the ability to securely update our devices as a key system design priority. A side effect of which is that our software maintenance cycle must also be designed and managed to ensure this can be performed, easily, in a timely manner to meet customer or functional demands over the same lifetime. Thus, whilst it is a potentially huge initial undertaking when all you want is to connect your device to the internet, how you do it, what your system architecture looks like, how you manage events, some of which may be outside your direct influence and, what issues may cascade into the broader system from a seemingly isolated event. These and many more factors must all be considered at the outset of the design.
An IoT device requires input from multiple interested parties IT, Marketing, Engineering, Sales, executive staff, finance, legal to ALL come together to clearly define and understand the needs, expectations, cost and deliverables of the physical product itself. Significant consideration of the legal implications surrounding product compliance, liability, issues around data collection, storage and usage plus the business processes and procedures that must be in place to deal with a data or integrity breach scenario. What is the business model to cover the long term costs of running the device in your cloud connected system? What user benefit does it provide? How a product is designed and, more crucially, maintained over the long term is heavily influenced by those initial decisions. Even the statement of ‘Just add Bluetooth’ to product ‘x’ requires as much consideration for a business as a full cloud connected system.
The advent of legislation such as the EU GDPR has placed many requirements on even the most benign of operations such as reading a connected thermostat. To help navigate some of the complexity many guidelines such as UK.gov ‘secure by Design’, the ENISA guidelines plus documents and guidelines from the IoT Security Foundation are available to help guide and provide some procedural formality to design.
For many businesses it is of paramount importance to maintain and protect integrity of thier Brand. A connected design, implemented poorly, without consideration for user experience, security, maintenance, trust in data handling and product integrity can, when things go wrong, very quickly cause irreparable damage to the brand with knock on effects to user confidence, trust and ultimately sales and the bottom line.
Ease of Use and the need to enhance security is often at odds in design. Complex passwords and device registration processes are, more often than not, incompatible with humans. Thus, how to design security into the product at manufacture and make it implicit in the operational infrastructure whilst making it compatible with the human need for simplicity.
If we consider a connected embedded product it consists of Four Core hardware elements
- Secure Hardware Element
Plus the all important Fifth element – Software
The 4 core hardware elements of a Secure, Robust, Reliable, Resilient IoT Solution. Processing, Memory, Communications and Secure Hardware Element plus, the all important and increasingly more complex software element.
These core elements are common across all systems. Exactly how they are implemented is a design choice to be made based on the exact use-case of the product, assessment of and attitude to risk, cost, development capabilities, security and maintainability. The target should be a system design that is Robust, Reliable, Resilient, Recoverable, Secure, Trusted, Scalable, Maintainable, Manufacturable and Usable. Maintains brand integrity and achieves a suitable cost target relevant to the demands the system will face. It should be assumed that since the demands on a connected system are somewhat different to those of a standalone, stripped down, non-updateable classic embedded design then the cost to design, develop and manufacture a connected device, which suitably meets the challenges it will face in service, the cost will increase. But, if designed correctly, and you gain some competitive advantage and/or your connected products enables changes to business cost and dynamics then, the value your product provides to your customer and your business operational gain, over the long term will far outweigh the component cost. The cost of doing IoT correctly versus the cost to fix it when it goes wrong are orders of magnitude different.
In many situation more than 1 processor will be in use on an IoT device. The wireless device will often include some level of processing, especially for Wi-Fi and Bluetooth to manage the complex underlying protocols. Exactly how your system is architected is a function of it’s use-case, needs, attitude to risk and many other factors.
Root of trust
A secure system requires a trusted means to store secrets and provide a means to securely authenticate the validity of a system whilst ensuring the secret itself is never revealed. To achieve this a ‘trust anchor’ is required usually in the form of a Secure Element or Crypto Authentication device. These devices provide multiple physical methods to prevent known hardware attacks whilst adding functions such as NIST approved True Random Number Generators (TRNG) and Key Verification methods such as FIPS compliant ECDSA (Elliptic Curve Digital Signature Algorithm). The days of trying to obfuscate keys within program memory are over because it is now cost effective to use a secure hardware device and leverage multiple benefits.
A secure element helps provide
- Authentication of device to cloud based services using well tested and understood Public Key Infrastructure methodologies. This allows pre-registration of devices on the system at manufacture leveraging individual certificates per device. Deriving a QR code at manufacture to tie the end product with a specific certificate. The user then tying the same QR code to their account at commissioning with the secure back-end system linking the certificates to the customer account. Thus aiding a simple, secure commissioning process whilst meeting regulatory requirements.
- Authentication of data. Using the trust anchor it is possible, short of direct physical intervention at the sensor, that the measurements from a device are definitely only from this device and haven’t been tampered with. This also aids spotting anomalies in data through cloud analytics since large scale physical intervention is difficult to achieve.
- Secure Boot. Leveraging the secret stored in the secure element to identify changes in the cryptographic signature of the host mcu and stored firmware update images. Further run-time integrity checks, leveraging methods from the ClassB Safety Libraries methodologies can also be used.
- Secure Firmware Upgrade Over the Air (FUOTA). The secret stored in the secure element is used to check the integrity of the source of the update and also the signature of the image sent to the device prior to bootload to confirm image is valid.
- Anti-Cloning. If manufacturing is managed correctly a secure element aids anti-cloning and counterfeiting of hardware prevention.
But, having a means to store the secrets for an individual product, within a hardware secure element, also requires that the devices are programmed in a secure manufacturing environment. This poses issues around scalability and trust associated, in many cases, with sub-contract manufacturers. ‘Just how secure is the manufacturing line and specifically who has potential access to my secret information in said manufacturing environment’. Therefore, the ability to purchase devices containing your private information pre-programmed by the device vendor, in a secure environment, with the ability to load the public information up to your cloud service, in a simple, automatable process helps provide manufacturing flexibility and eases commissioning.
Classically firmware update is performed using a cable, direct to the device using a serial port. This has worked for many years but, how to scale this approach with connected devices, possibly in inaccessible locations, deployed at large scale. We can use the old, proven, people in vans approach but this is simply too expensive to perform and difficult to scale, especially at speed if a critical issue needs attention. In the event of an issue which needs an update deploying quickly, outside of a routine maintenance cycle, then an approach needing physical intervention should be avoided. The alternative is to use Firmware Upgrade Over the Air (FUOTA) updates and ideally Secure FUOTA. Since this is a hands-off approach then the system needs to leverage the integrity enabled by the secure element to prevent rogue updates from an unknown/untrusted source. But how to perform the update itself. Ideally Secure FUOTA updates should be performed leaving the host MCU operation intact. Performing an update direct on device program flash memory, with no local backup, risks the dreaded ‘bricking’ scenario where an unrecoverable error occurs during the update. An example being executable flash memory has been erased, an ESD or power event occurs resulting in re-boot or latch-up. With an erased block of memory your device should then fail it’s startup integrity checks and essentially turn into a brick. The situation is only recoverable with a potentially expensive physical intervention. Of course, there is an argument that you can use a program memory split 50/50 approach and vector to one of two functional images stored in executable program flash memory. The inherent risk here comes from a device maintaining an old version of code in executable memory. An event such as a corrupt pointer or program counter caused by noise, ESD or leveraging a physical attack can result in numerous high risk scenarios playing out. Thus, using an external, non-executable memory to store update images reduces risk in your system. This also allows the actual bootloader code to be small and simple since it needs only to handle the comms to the memory and interpret the image. No comms protocols, No radios. It needs robustness to handle fault scenarios, check image integrity and ensure a valid image is loaded into the device. Then ensure it can connect to the system again and confirm the update cycle. If this fails then the original image remains resident locally so it can be re-loaded, comms re-established and a failed update procedure performed.
The FUOTA method chosen should be comms medium agnostic. That is to say the same process should be able to cope with bandwidth, latency, dropouts and losses on any physical medium. This way the same back end process server side and the same download, storage and integrity check mechanism device side can be employed and maintained across a variety of different mediums. Inevitably there will be some variability on a one method fits all based on your complete system deployment. But, keeping close to a standard approach using modular methods aids code maintenance over the long term.
If you are deploying devices globally and pushing out even a controlled update to groups of devices. It is difficult to know, exactly, the power and environmental conditions for every device. Yet we expect it to operate in exactly the same way as it would in the lab during test and guarantee it will keep our device in the ideal operating conditions. But what if it doesn’t? What if say, an ESD event occurs on one group of devices and not to another. Our PSU design should also form part of the what if scenarios and should feed into an overall strategy for resilience and recovery. Let’s say an ESD or power event occurs during an FUOTA download resulting in a device reset or potentially even a latch-up. Some devices complete some devices fail and have to recover, determine which blocks are missing, contact the server, advise it of their specific situation. The server in turn needs to be designed to cope with a potentially large scale number of devices all requesting different blocks be sent, maybe the whole image. Does it therefore handle each of them individually, does it send out the whole image again and allow the devices to listen for their specific missing blocks. If our end node is battery powered, does it have sufficient energy available to follow the defined update procedure. The power system on your board is as critical in the whole system scenario as the MCU, Radio and other seemingly more complex and important functions.
If it Never happens and your system is Never hacked…
- Be thankful. You may just be lucky or, your diligent planning and preventative design approach made the challenge difficult enough that the casual attacker went elsewhere.
- There is no perfect security and it will be challenged in time. But, we can utilise the best techniques and capabilities available today and plan for a reasonable amount of future proofing. If our system is designed to leverage root of trust devices and be updateable in a secure manner then we can leverage the more flexible cloud side of our system to handle the majority of the burden. Provided we have designed with this in mind.
- Approach the design using the techniques available through organisations such as IoT Security Foundation, UK.gov Secure by Design, UL2900, ISA 62443 and ISA Secure and others as they emerge.
- Build in sufficient headroom to allow for inevitable code size increase.
Design for worst case then make cognitive compromises rather than design for cost without consideration for the what if scenario’s. IoT devices are a huge potential attack surface for hackers, nefarious actors and just smart people out to have a bit of fun.
The reasons why your particular device is attacked don’t have to make sense to you or I but the potential damage to people, products, plant, brand and companies can be significant.
BUT....It won’t happen to me is no defence when it does.
Principal Embedded Solutions Engineer : Wireless, Cloud and IoT Systems – Microchip Technology Inc.
Ian has 20+ years experience designing embedded systems. For the last 10+ years he has supported connected embedded systems focusing originally on Ethernet and TCP/IP and has since been active in bringing Microchip Wireless Solutions in Wi-Fi, Bluetooth and LoRa into the embedded space. Ian has been active in IoT since the early days and is a vocal advocate for the design of secure IoT systems.
IoT Design : An Approach to design resilient, robust, reliable secure systems - Date published: 18th February 2019 by Farnell