As you may have seen, I sent the following Tweet: “The Apple ARM MacBook future is coming, maybe sooner than people expect” https://twitter.com/choco_bit/status/1266200305009676289?s=20
Today, I would like to further elaborate on that. tl;dr Apple will be moving to Arm based macs in what I believe are 4 stages, starting around 2015 and ending around 2023-2025: Release of T1 chip Macbooks, release of T2 chip Macbooks, Release of at least one lower end model Arm Macbook, and transitioning full lineup to Arm. Reasons for each are below.
Apple is very likely going to switch to switch their CPU platform to their in-house silicon designs with an ARM architecture. This understanding is a fairly common amongst various Apple insiders. Here is my personal take on how this switch will happen and be presented to the consumer.
The first question would likely be “Why would Apple do this again?”. Throughout their history, Apple has already made two other storied CPU architecture switches - first from the Motorola 68k to PowerPC in the early 90s, then from PowerPC to Intel in the mid 2000s. Why make yet another? Here are the leading reasons:
- Intel has, in recent years, been making significant losses both in reputation and in actual product value, as well as velocity of product development, breaking their bi-yearly “Tick Tock” cycle for the first time in decades. Most recently, they have fallen well behind AMD’s processor lines in cost to performance ratio, CPU core count, core design (monolithic design vs “chiplet”), power consumption to performance, silicon supply (Intel with significant manufacturing process and yield issues), and on-silicon security features. While Intel still wins out in certain enterprise and datacenter applications, as well as having a much better reputation for reliability and QA (AMD having shipped numerous chips with a broken random- number generator that prevented even booting some mainstream operating system), the number of such applications slowly dwindles with each new release from AMD, and as confidence among decisionmakers in enterprise increases. In the public consciousness, Intel is quickly becoming a point of ridicule against Apple’s Mac lineup, rather than a badge of honor.
- By moving to their own designs, Apple will be free from Intel’s release schedule, which have recently been unpredictable and faced with routine delays due to poor manufacturing yields. Apple will be able to update their Mac lineup on their own timeline, rather than being forced to delay products based on Intel’s ability to meet the release window. This also allows them to leverage relationships with other silicon fabricators to source chips, rather than relying on Intel ’s continued “iteration” that’s leading to a “14nm++++++++++” process, or the continued lack of product diversity with the 10nm process. Apple will also be free to innovate in the design of the silicon platform, rather than being limited by Intel’s design choices. By having full control of the manufacturing and development cycle, Apple can bring even more in-house optimization to the macOS, as they have been for iOS and iPadOS over the years.
- Using an ARM architecture on the Macs allows for a more unified Apple ecosystem, rather than having separate Mac and iOS-based products. The only distinction will be the device form factor and performance characteristics.
- The x86_64 architecture is very old and inefficient, using older methodologies for processor design (CISC vs ARM’s RISC), and the instruction set continues to require support in silicon for emulating 1980s-vintage 16-bit modes, as well as ineffectual and archaic memory addressing modes (segmentation, etc.) The x86_64 architecture is like a city, built atop a much older city, built atop a yet older city, but every layer is built with NYC infrastructure levels of complexity that suited its time and no further.
- Over the last 10 years, Apple has shown that they can consistently produce impressive silicon designs, often leading the market in performance and capability, and Apple has been aggressively acquiring silicon design talent.
A common refrain heard on the Internet is the suggestion that Apple should switch to using CPUs made by AMD, and while this has been considered internally, it will most likely not be chosen as the path forward, even for their megalithic giants like the Mac Pro. Even though AMD would mitigate Intel’s current set of problems, it does nothing to help the issue of the x86_64 architecture’s problems and inefficiencies, on top of jumping to a platform that doesn’t have a decade of proven support behind it. Why spend a lot of effort re-designing and re- optimizing for AMD’s platform when you can just put that effort into your own, and continue the vertical integration Apple is well-known for?
I believe that the internal development for the ARM transition started around 2015/2016 and is considered to be happening in 4 distinct stages. These are not all information from Apple insiders; some of these these are my own interpretation based off of information gathered from supply-chain sources, examination of MacBook schematics, and other indicators from Apple.
Stage1 (from 2014/2015 to 2017):
The rollout of computers with Apple’s T1 chip as a coprocessor. This chip is very similar to Apple’s T8002 chip design, which was used for the Apple Watch Series 1 and Series 2. The T1 is primarily present on the first TouchID enabled Macs, 2016 and 2017 model year MacBook Pros.
Considering the amount of time required to design and validate a processor, this stage most likely started around 2014 or 2015, with early experimentation to see whether an entirely new chip design would be required, or if would be sufficient to repurpose something in the existing lineup. As we can see, the general purpose ARM processors aren’t a one- trick pony.
To get a sense of the decision making at the time, let’s look back a bit. The year is 2016, and we're witnessing the beginning of stagnation of Intel processor lineup. There is not a lot to look forward to other than another “+” being added to the 14nm fabrication process. The MacBook Pro has used the same design for many years now, and its age is starting to show. Moving to AMD is still very questionable, as they’ve historically not been able to match Intel’s performance or functionality, especially at the high end, and since the “Ryzen” lineup is still unreleased, there is absolutely no benchmarks or other data to show they are worth consideration, and AMD’s most recent line of “Bulldozer” processors were very poorly received. Now is probably as good a time as any to begin experimenting with the in-house ARM designs, but it’s not time to dive into the deep end yet, our chips are not nearly mature enough to compete, and it’s not yet certain how long Intel will be stuck in the mud. As well, it is widely understood that Apple and Intel have an exclusivity contract in exchange for advantageous pricing. Any transition would take considerable time and effort, and since there are no current viable alternative to Intel, the in-house chips will need to advance further, and breaching a contract with Intel is too great a risk. So it makes sense to start with small deployments, to extend the timeline, stretch out to the end of the contract, and eventually release a real banger of a Mac.
Thus, the 2016 Touch Bar MacBooks were born, alongside the T1 chip mentioned earlier. There are good reasons for abandoning the piece of hardware previously used for a similar purpose, the SMC or System Management Controller. I suspect that the biggest reason was to allow early analysis of the challenges that would be faced migrating Mac built- in peripherals and IO to an ARM-based controller, as well as exploring the manufacturing, power, and performance results of using the chips across a broad deployment, and analyzing any early failure data, then using this to patch any issues, enhance processes, and inform future designs looking towards the 2nd stage.
The former SMC duties now moved to T1 includes things like
- Fan speed, voltage, amperage and thermal sensor feedback data
- FaceTime camera and microphone IO
- PMIC (Power Management Controller)
- Direct communication to NAND (solid state storage)
- Direct communication with the Touch Bar
- Secure Enclave for TouchID
The T1 chip also communicates with a number of other controllers to manage a MacBook’s behavior. Even though it’s not a very powerful CPU by modern standards, it’s already responsible for a large chunk of the machine’s operation. Moving control of these peripherals to the T1 chip also brought about the creation of the fabled BridgeOS software, a shrunken-down watchOS-based system that operates fully independently of macOS and the primary Intel processor.
BridgeOS is the first step for Apple’s engineering teams to begin migrating underlying systems and services to integrate with the ARM processor via BridgeOS, and it allowed internal teams to more easily and safely develop and issue firmware updates. Since BridgeOS is based on a standard and now well-known system, it means that they can leverage existing engineering expertise to flesh out the T1’s development, rather than relying on the more arcane and specialized SMC system, which operates completely differently and requires highly specific knowledge to work with. It also allows reuse of the same fabrication pipeline used for Apple Watch processors, and eliminated the need to have yet another IC design for the SMC, coming from a separate source, to save a bit on cost.
Also during this time, on the software side, “Project Marzipan”, today Catalyst, came into existence. We'll get to this shortly.
For the most part, this Stage 1 went without any major issues. There were a few firmware problems at first during the product launch, but they were quickly solved with software updates. Now that engineering teams have had experience building for, manufacturing, and shipping the T1 systems, Stage 2 would begin.
Stage 2 encompasses the rollout of Macs with the T2 coprocessor, replacing the T1. This includes a much wider lineup, including MacBook Pro with Touch Bar, starting with 2018 models, MacBook Air starting with 2018 models, the iMac Pro, the 2019 Mac Pro, as well as Mac Mini starting in 2018.
With this iteration, the more powerful T8012 processor design was used, which is a further revision of the T8010 design that powers the A10 series processors used in the iPhone 7. This change provided a significant increase in computational ability and brought about the integration of even more devices into T2. In addition to the T1’s existing responsibilities, T2 now controls:
- Full audio subsystem
- Secure Enclave for internal NAND storage and encryption/decryption offload
- Management of the whole system’s power and startup sequence, allowing for trusted boot (ensure boot chain-of-trust with no malicious code/rootkit/bootkit)
Those last 2 points are crucial for Stage 2. Under this new paradigm, the vast majority of the Mac is now under the control of an in-house ARM processor. Stage 2 also brings iPhone-grade hardware security to the Mac. These T2 models also incorporated a supported DFU (Device Firmware Update, more commonly “recovery mode”), which acts similarly to the iPhone DFU mode and allows restoration of the BridgeOS firmware in the event of corruption (most commonly due to user-triggered power interruption during flashing).
Putting more responsibility onto the T2 again allows for Apple’s engineering teams to do more early failure analysis on hardware and software, monitor stability of these machines, experiment further with large-scale production and deployment of this ARM platform, as well as continue to enhance the silicon for Stage 3.
A few new user-visible features were added as well in this stage, such as support for the passive “Hey Siri” trigger, and offloading image and video transcoding to the T2 chip, which frees up the main Intel processor for other applications. BridgeOS was bumped to 2.0 to support all of these changes and the new chip.
On the macOS software side, what was internally known as Project Marzipan was first demonstrated to the public. Though it was originally discovered around 2017, and most likely began development and testing within later parts of Stage 1, its effects could be seen in 2018 with the release of iPhone apps, now running on the Mac using the iOS SDKs: Voice Recorder, Apple News, Home, Stocks, and more, with an official announcement and public release at WWDC in 2019. Catalyst would come to be the name of Marzipan used publicly. This SDK release allows app developers to easily port iOS apps to run on macOS, with minimal or no code changes, and without needing to develop separate versions for each. The end goal is to allow developers to submit a single version of an app, and allow it to work seamlessly on all Apple platforms, from Watch to Mac. At present, iOS and iPadOS apps are compiled for the full gamut of ARM instruction sets used on those devices, while macOS apps are compiled for x86_64. The logical next step is to cross this bridge, and unify the instruction sets.
With this T2 release, the new products using it have not been quite as well received as with the T1. Many users have noticed how this change contributes further towards machines with limited to no repair options outside of Apple’s repair organization, as well as some general issues with bugs in the T2.
Products with the T2 also no longer have the “Lifeboat” connector, which was previously present on 2016 and 2017 model Touch Bar MacBook Pro. This connector allowed a certified technician to plug in a device called a CDM Tool (Customer Data Migration Tool) to recover data off of a machine that was not functional. The removal of this connector limits the options for data recovery in the event of a problem, and Apple has never offered any data recovery service, meaning that a irreparable failure of the T2 chip or the primary board would result in complete data loss, in part due to the strong encryption provided by the T2 chip (even if the data got off, the encryption keys were lost with the T2 chip). The T2 also brought about the linkage of component serial numbers of certain internal components, such as the solid state storage, display, and trackpad, among other components. In fact, many other controllers on the logic board are now also paired to the T2, such as the WiFi and Bluetooth controller, the PMIC (Power Management Controller), and several other components. This is the exact same system used on newer iPhone models and is quite familiar to technicians who repair iPhone logic boards. While these changes are fantastic for device security and corporate and enterprise users, allowing for a very high degree of assurance that devices will refuse to boot if tampered with in any way - even from storied supply chain attacks, or other malfeasance that can be done with physical access to a machine - it has created difficulty with consumers who more often lack the expertise or awareness to keep critical data backed up, as well as the funds to perform the necessary repairs from authorized repair providers. Other issues reported that are suspected to be related to T2 are audio “cracking” or distortion on the internal speakers, and the BridgeOS becoming corrupt following a firmware update resulting in a machine that can’t boot.
I believe these hiccups will be properly addressed once macOS is fully integrated with the ARM platform. This stage of the Mac is more like a chimera of an iPhone and an Intel based computer. Technically, it does have all of the parts of an iPhone present within it, cellular radio aside, and I suspect this fusion is why these issues exist.
Recently, security researchers discovered an underlying security problem present within the Boot ROM code of the T1 and T2 chip. Due to being the same fundamental platform as earlier Apple Watch and iPhone processors, they are vulnerable to the “checkm8” exploit (CVE-2019-8900). Because of how these chips operate in a Mac, firmware modifications caused by use of the exploit will persist through OS reinstallation and machine restarts. Both the T1 and T2 chips are always on and running, though potentially in a heavily reduced power usage state, meaning the only way to clean an exploited machine is to reflash the chip, triggering a restart, or to fully exhaust or physically disconnect the battery to flush its memory. Fortunately, this exploit cannot be done remotely and requires physical access to the Mac for an extended duration, as well as a second Mac to perform the change, so the majority of users are relatively safe. As well, with a very limited execution environment and access to the primary system only through a “mailbox” protocol, the utility of exploiting these chips is extremely limited. At present, there is no known malware that has used this exploit. The proper fix will come with the next hardware revision, and is considered a low priority due to the lack of practical usage of running malicious code on the coprocessor.
At the time of writing, all current Apple computers have a T2 chip present, with the exception of the 2019 iMac lineup. This will change very soon with the expected release of the 2020 iMac lineup at WWDC, which will incorporate a T2 coprocessor as well.
Note: from here on, this turns entirely into speculation based on info gathered from a variety of disparate sources.
Right now, we are in the final steps of Stage 2. There are strong signs that an a MacBook (12”) with an ARM main processor will be announced this year at WWDC (“One more thing...”), at a Fall 2020 event, Q1 2021 event, or WWDC 2021. Based on the lack of a more concrete answer, WWDC2020 will likely not see it, but I am open to being wrong here.
Stage3 (Present/2021 - 2022/2023):
Stage 3 involves the first version of at least one fully ARM-powered Mac into Apple’s computer lineup.
I expect this will come in the form of the previously-retired 12” MacBook. There are rumors that Apple is still working internally to perfect the infamous Butterfly keyboard, and there are also signs that Apple is developing an A14x based processors with 8-12 cores designed specifically for use as the primary processor in a Mac. It makes sense that this model could see the return of the Butterfly keyboard, considering how thin and light it is intended to be, and using an A14x processor would make it will be a very capable, very portable machine, and should give customers a good taste of what is to come.
Personally, I am excited to test the new 12" “ARMbook”. I do miss my own original 12", even with all the CPU failure issues those older models had. It was a lovely form factor for me.
It's still not entirely known whether the physical design of these will change from the retired version, exactly how many cores it will have, the port configuration, etc. I have also heard rumors about the 12” model possibly supporting 5G cellular connectivity natively thanks to the A14 series processor. All of this will most likely be confirmed soon enough.
This 12” model will be the perfect stepping stone for stage 3, since Apple’s ARM processors are not yet a full-on replacement for Intel’s full processor lineup, especially at the high end, in products such as the upcoming 2020 iMac, iMac Pro, 16” MacBook Pro, and the 2019 Mac Pro.
Performance of Apple’s ARM platform compared to Intel has been a big point of contention over the last couple years, primarily due to the lack of data representative of real-world desktop usage scenarios. The iPad Pro and other models with Apple’s highest-end silicon still lack the ability to execute a lot of high end professional applications, so data about anything more than video editing and photo editing tasks benchmarks quickly becomes meaningless. While there are completely synthetic benchmarks like Geekbench, Antutu, and others, to try and bridge the gap, they are very far from being accurate or representative of the real real world performance in many instances. Even though the Apple ARM processors are incredibly powerful, and I do give constant praise to their silicon design teams, there still just isn’t enough data to show how they will perform for real-world desktop usage scenarios, and synthetic benchmarks are like standardized testing: they only show how good a platform is at running the synthetic benchmark. This type of benchmark stresses only very specific parts of each chip at a time, rather than how well it does a general task, and then boil down the complexity and nuances of each chip into a single numeric score, which is not a remotely accurate way of representing processors with vastly different capabilities and designs. It would be like gauging how well a person performs a manual labor task based on averaging only the speed of every individual muscle in the body, regardless of if, or how much, each is used. A specific group of muscles being stronger or weaker than others could wildly skew the final result, and grossly misrepresent performance of the person as a whole. Real world program performance will be the key in determining the success and future of this transition, and it will have to be great on this 12" model, but not just in a limited set of tasks, it will have to be great at *everything*. It is intended to be the first Horseman of the Apocalypse for the Intel Mac, and it better behave like one. Consumers have been expecting this, especially after 15 years of Intel processors, the continued advancement of Apple’s processors, and the decline of Intel’s market lead.
The point of this “demonstration” model is to ease both users and developers into the desktop ARM ecosystem slowly. Much like how the iPhone X paved the way for FaceID-enabled iPhones, this 12" model will pave the way towards ARM Mac systems. Some power-user type consumers may complain at first, depending on the software compatibility story, then realize it works just fine since the majority of the computer users today do not do many tasks that can’t be accomplished on an iPad or lower end computer. Apple needs to gain the public’s trust for basic tasks first, before they will be able to break into the market of users performing more hardcore or “Pro” tasks. This early model will probably not be targeted at these high-end professionals, which will allow Apple to begin to gather early information about the stability and performance of this model, day to day usability, developmental issues that need to be addressed, hardware failure analysis, etc. All of this information is crucial to Stage 4, or possibly later parts of Stage 3.
The 2 biggest concerns most people have with the architecture change is app support and Bootcamp.
Any apps released through the Mac App Store will not be a problem. Because App Store apps are submitted as LLVM IR (“Bitcode”), the system can automatically download versions compiled and optimized for ARM platforms, similar to how App Thinning on iOS works. For apps distributed outside the App Store, thing might be more tricky. There are a few ways this could go:
- Developer will need to build both x86_64 and ARM version of their app - App Bundles have supported multiple-architecture binaries since the dawn of OS X and the PowerPC transition
- Move to apps being distributed in an architecture-independent manner, as they are on the App Store. There is some software changes that are suggestive of this, such as the new architecture in dyld3.
- An x86_64 instruction decoder in silicon - very unlikely due to the significant overhead this would create in the silicon design, and potential licensing issues. (ARM, being a RISC, “reduced instruction set”, has very few instructions; x86_64 has thousands)
- Server-side ahead-of-time transpilation (converting x86 code to equivalent ARM code) using Notarization submissions - Apple certainly has the compiler chops in the LLVM team to do something like this
- Outright emulation, similar to the approach that was taken in ARM releases of Windows, but received extremely poorly (limited to 32-bit apps, and very very slow)There could be other solutions in the works to fix this but I am not aware of any. This is just me speculating about some of the possibilities.
As for Bootcamp, while ARM-compatible versions of Windows do exist and are in development, they come with their own similar set of app support problems. Microsoft has experimented with emulating x86_64 on their ARM-based Surface products, and some other OEMs have created their own Windows-powered ARM laptops, but with very little success. Performance is a problem across the board, with other ARM silicon not being anywhere near as advanced, and with the majority of apps in the Windows ecosystem that were not developed in-house at Microsoft running terribly due to the x86_64 emulation software. If Bootcamp does come to the early ARM MacBook, it more than likely will run like very poorly for anything other than Windows UWP apps. There is a high chance it will be abandoned entirely until Windows becomes much more friendly to the architecture.
I believe this will also be a very crucial turning point for the MacBook lineup as a whole. At present, the iPad Pro paired with the Magic Keyboard is, in many ways, nearly identical to a laptop, with the biggest difference being the system software itself. While Apple executives have outright denied plans of merging the iPad and MacBook line, that could very well just be a marketing stance, shutting the down rumors in anticipation of a well-executed surprise. I think that Apple might at least re-examine the possibility of merging Macs and iPads in some capacity, but whether they proceed or not could be driven by consumer reaction to both products. Do they prefer the feel and usability of macOS on ARM, and like the separation of both products? Is there success across the industry of the ARM platform, both at the lower and higher end of the market? Do users see that iPadOS and macOS are just 2 halves of the same coin? Should there be a middle ground, and a new type of product similar to the Surface Book, but running macOS? Should Macs and iPads run a completely uniform OS? Will iPadOS ever see exposed the same sort of UNIX-based tools for IT administrators and software developers that macOS has present? These are all very real questions that will pop up in the near future.
The line between Stage 3 and Stage 4 will be blurry, and will depend on how Apple wishes to address different problems going forward, and what the reactions look like. It is very possible that only 12” will be released at first, or a handful more lower end model laptop and desktop products could be released, with high performance Macs following in Stage 4, or perhaps everything but enterprise products like Mac Pro will be switched fully. Only time will tell.
Stage 4 (the end goal):
Congratulations, you’re made it to the end of my TED talk. We are now well into the 2020s and COVID-19 Part 4 is casually catching up to the 5G = Virus crowd. All Macs have transitioned fully to ARM. iMac, MacBooks Pro and otherwise, Mac Pro, Mac Mini, everything. The future is fully Apple from top to bottom, and vertical integration leading to market dominance continues. Many other OEM have begun to follow in this path to some extent, creating more demand for a similar class of silicon from other firms.
The remainder here is pure speculation with a dash of wishful thinking. There are still a lot of things that are entirely unclear. The only concrete thing is that Stage 4 will happen when everything is running Apple’s in- house processors.
By this point, consumers will be quite familiar with the ARM Macs existing, and developers have had have enough time to transition apps fully over to the newly unified system. Any performance, battery life, or app support concerns will not be an issue at this point.
There are no more details here, it’s the end of the road, but we are left with a number of questions.
It is unclear if Apple will stick to AMD's GPUs or whether they will instead opt to use their in-house graphics solutions that have been used since the A11 series of processors.
How Thunderbolt support on these models of Mac will be achieved is unknown. While Intel has made it openly available for use, and there are plans to have USB and Thunderbolt combined in a single standard, it’s still unclear how it will play along with Apple processors. Presently, iPhones do support connecting devices via PCI Express to the processor, but it has only been used for iPhone and iPad storage. The current Apple processors simply lack the number of lanes required for even the lowest end MacBook Pro. This is an issue that would need to be addressed in order to ship a full desktop-grade platform.
There is also the question of upgradability for desktop models, and if and how there will be a replaceable, socketed version of these processors. Will standard desktop and laptop memory modules play nicely with these ARM processors? Will they drop standard memory across the board, in favor of soldered options, or continue to support user-configurable memory on some models? Will my 2023 Mac Pro play nicely with a standard PCI Express device that I buy off the shelf? Will we see a return of “Mac Edition” PCI devices?
There are still a lot of unknowns, and guessing any further in advance is too difficult. The only thing that is certain, however, is that Apple processors coming to Mac is very much within arm’s reach.
I am an embedded electronics guy who has several years of experience in the industry, mainly with writing embedded software in C at the high level and the low level. My goal is to start fresh with some projects in terms of software platforms, so I have been looking at whether to use existing programming languages. I want my electronics / software to be open, but therein lies part of the problem. I have experience using and evaluating many compilers during my experience such as the proprietary stuff (IAR) and open source stuff (clang , gcc, etc.). I have nothing against the open source stuff; however, the companies I have worked for (and I) always come crawling back to IAR. Why? Its not a matter of the compiler believe it or not! Its a matter of the linker.
I took a cursory look at the latest gnu / clang linkers and I do not think that have fixed the major issue we always had with these linkers: memory flood fill. Specifying where each object or section is in the memory is fine for small projects or very small teams (1 to 2 people). However, when you have a bigger team (> 2) and you are using microcontrollers with segmented memory (all memory blocks are not contiguous), memory flood fill becomes a requirement of the linker. Often is the case that the MCUs I and others work on do not have megabytes of memory, but kilobytes. The MCU is chosen for the project and if we are lucky to get one with lots of memory, then you know why such a chip was chosen - there is a large memory requirement in the software.. we would not choose a large memory part if we did not need it due to cost. Imagine a developer is writing a library or piece of code whose memory requirement is going to change by single or tens kilobytes each (added or subtracted) commit. Now imagine having to have this developer manually manage the linker script for their particular dev station each time to make sure the linker doesn't cough based on what everybody else has put it in there. On top of that, they need to manually manage the script if it needs to be changed when they commit and hope that nobody else needed to change it as well for whatever they were developing. For even a small amount of developers, manually managing the script has way too many moving parts to be efficient. Memory flood fill solves this problem. IAR (in addition to a few other linkers like Segger's) allow me to just say: "Here are the ten memory blocks on the device. I have a .text section. You figure out how to spread out all the data across those blocks." No manual script modifications required by each developer for their current dev or requirement to sync at the end when committing. It just works.
Now.. what's the next problem? I don't want to use IAR (or Segger)! Why? If my stuff is going to be open to the public on my repositories.. don't you think it sends the wrong message if I say: "Well, here is the source code everybody! But Oh sorry, you need to get a seat of IAR if you want to build it the way I am or figure out how to build it yourself with your own tool chain". In addition, let's say that we go with Segger's free stuff to get by the linker problem. Well, what if I want to make a sellable product based on the open software? Still need to buy a seat, because Segger only allows non commercial usage of their free stuff. This leaves me with using an open compiler.
To me, memory flood fill for the linker is a requirement. I will not use a C tool chain that does not have this feature. My compiler options are clang, gcc, etc. I can either implement a linker script generator or a linker itself. Since I do not need to support dynamic link libraries or any complicated virtual memory stuff in the linker, I think implementing a linker is easily doable. The linker script generator is the simple option, but its a hack and therefore I would not want to partake in it. Basically before the linker (LD / LLD) is invoked, I would go into all the object files and analyze all of their memory requirements and generate a linker script that implements the flood fill as a pre step. Breaking open ELF files and analyzing them is pretty easy - I have done it in the past. The pre step would have my own linker script format that includes provisions for memory flood fill. Since this is like invoking the linker twice.. its a hack and speed detriment for something that I think should have been a feature of LD / LLD decades ago. "Everybody is using gnu / clang with LD / LLD! Why do you think you need flood fill?" To that I respond with: "People who are using gnu / clang and LD / LLD are either on small teams (embedded) OR they are working with systems that have contiguous memory and don't have to worry about segmented memory. Case and point Phones, Laptops, Desktops, anything with external RAM" Pick one reason. I am sure there are other reasons beyond those two in which segmented memory is not an issue. Maybe the segmented memory blocks are so large that you can ignore most of them for one program - early Visual GDB had this issue.. you would go into the linker scripts to find that for chips like the old NXP 4000 series that they were only choosing a single RAM block for data memory because of the linker limitation. This actually horrendously turned off my company from using gnu / clang at the time. In embedded systems where MCUs are chosen based on cost, the amount of memory is specifically chosen to meet that cost. You can't just "ignore" a memory block due to linker limitations. This would require either to buy a different chip or more expensive chip that meets the memory requirements.
ANYWAYS.. long winded prelude to what has led me to looking at making my own programming language. TLDR: I want my software to be open.. I want people to be able to easily build it without shelling out an arm and a leg, and I am a person who is not fond of hacks because of, what I believe, are oversights in the design of existing software.
Why not use Rust, Nim, Go, Zig, any of those languages? No. Period. No. I work with small embedded systems running with small memory microcontrollers as well as a massive number of other companies / developers. Small embedded systems are what make most of the world turn. I want a systems programming language that is as simple as C with certain modern developer "niceties". This does not mean adding the kitchen sink.. generics, closures, classes ................ 50 other things because the rest of the software industry has been using these for years on higher level languages. It is my opinion that the reason that nothing has (or will) displace C in the past, present, or near future is because C is stupid simple. Its basically structures, functions, and pointers... that's it! Does it have its problems? Sure! However, at the end of the day developers can pick up a C program and go without a huge hassle. Why can't we have a language that sticks to this small subset or "core" functionality instead of trying to add the kitchen sink with all these features of other languages? Just give me my functions and structures, and iterate on that. Let's fix some of the developer productivity issues while we are at it.. and no I don't mean by adding generics and classes. I mean more of getting rid of header files and allowing CTFE. "D is what you want." No.. no it's not. That is a prime example of kitchen sink and the kitchen sink of 50 large corporations across the the block.
What are the problems I think need to be solved in a C replacement?
- Header files.
- Implementation hiding. Don't know the size of that structure without having to manually manage the size of that structure in a header or exposing all the fields of that structure in a header. Every change of the library containing that structure causes a recompile all the way up the chain on all dependencies.
- CTFE (compile time function execution). I want to be able to assign type safe constants to things on initialization.
- Pointers replaced with references? I am on the fence with this one. I love the power of pointers, but I realize after research where the industry is trying to go.
These are the things I think that need to be solved. Make my life easier as a developer, but also give me something as stupid simple as C.
I have some ideas of how to solve some of these problems. Disclaimer: some things may be hypocritical based on the prelude discussion; however, as often is the case, not 'every' discussion point is black and white.
- Header Files
Replace with a module / package system. There exists a project folder wherein there lies a .build script. The compiler runs the build script and builds the project. Building is part of the language / compiler, but dependency and versioning is not. People will be on both sides of the camp.. for or against this. However, it appears that most module type languages require specifying all of the input files up front instead of being able to "dumb compile" like C / C++ due to the fact that all source files are "truly" dumbly independent. Such a module build system would be harder to make parallel due to module dependencies; however, in total, required build "computation" (not necessarily time) is less. This is because the compiler knows everything up front that makes a library and doesn't have to spawn a million processes (each taking its own time) for each source file.
- Implementation hiding
What if it was possible to make a custom library format for the language? Libraries use this custom format and contain "deferrals" for a lot of things that need to be resolved. During packaging time, the final output stage, link time, whatever you want to call it (the executable output), the build tool resolves all of the deferrals because it now knows all parts of input "source" objects. What this means is that the last stage of the build process will most likely take the longest because it is also the stage that generates the code.
What is a deferral? Libraries are built with type information and IR like code for each of the functions. The IR code is a representation that can be either executed by interpreter (for CTFE) or converted to binary instructions at the last output stage. A deferral is a node within the library that requires to be resolved at the last stage. Think of it like an unresolved symbol but for mostly constants and structures.
Inside my library A I have a structure that has a bunch of fields. Those fields may be public or private. Another library B wants to derive from that structure. It knows the structure type exists and it has these public fields. The library can make usage of those public fields. Now at the link stage the size of the structure and all derivative structures and fields are resolved. A year down the road library A changes to add a private field to the structure. Library B doesn't care as long as the type name of the structure or its public members that it is using are not changed. Pull in the new library into the link stage and everything is resolved at that time.
I am an advocate for just having plain old C structures but having the ability to "derive" sub structures. Structures would act the same exact way as in C. Let's say you have one structure and then in a second structure you put the first field as the "base" field. This is what I want to have the ability to do in a language.. but built in support for it through derivation and implementation hiding. Memory layout would be exactly like in C. The structures are not classes or anything else.
I have an array of I2C ports in a library; however, I have no idea how many I2C ports there should be until link time. What to do!? I define a deferred constant for the size of the array that needs to be resolved at link time. At link time the build file passes the constant into the library. Or it gets passed as a command line argument.
What this also allows me to do is to provide a single library that can be built using any architecture at link time.
Having safe type checked ways to define constants or whatever, filled in by the compiler, I think is a very good mechanism. Since all of the code in libraries is some sort of IR, it can be interpreted at link time to fill in all the blanks. The compiler would have a massive emphasis on analyzing which things are constants in the source code and can be filled in at link time.
There would exist "conditional compilation" in that all of the code exists in the library; however, at link time the conditional compilation is evaluated and only the areas that are "true" are included in the final output.
- Pointers & References & Type safety
I like pointers, but I can see the industry trend to move away from them in newer languages. Newer languages seem to kneecap them compared to what you can do in C. I have an idea of a potential fix.
Pointers or some way is needed to be able to access hardware registers. What if the language had support for references and pointers, but pointers are limited to constants that are filled in by the build system? For example, I know hardware registers A, B, and C are at these locations (maybe filled in by CTFE) so I can declare them as constants. Their values can never be changed at runtime; however, what a pointer does is indicate to the compiler to access a piece of memory using indirection.
There would be no way to convert a pointer to a reference or vise versa. There is no way to assign a pointer to a different value or have it point anything that exists (variables, byte arrays, etc..). Then how do we perform a UART write with a block of data? I said there would be no way to convert a reference ( a byte array for example) to a pointer, but I did not say you could not take the address of a reference! I can take the address of a reference (which points to a block of variable memory) and convert to it to an integer. You can perform any math you want with that integer but you can't actually convert that integer back into a reference! As far as the compiler is concerned, the address of a reference is just integer data. Now I can pass that integer into a module that contains a pointer and write data to memory using indirection.
As far as the compiler is concerned, pointers are just a way to tell the compiler to indirectly read and write memory. It would treat pointers as a way to read and write integer data to memory by using indirection. There exists no mechanism to convert a pointer to a reference. Since pointers are essentially constants, and we have deferrals and CTFE, the compiler knows what all those pointers are and where they point to. Therefore it can assure that no variables are ever in a "pointed to range". Additionally, for functions that use pointers - let's say I have a block of memory where you write to each 1K boundary and it acts as a FIFO - the compiler could check to make sure you are not performing any funny business by trying to write outside a range of memory.
What are references? References are variables that consist of say 8 bytes of data. The first 4 bytes are an address and the next 4 bytes is type information. There exists a reference type (any) that be used for assigning any type to it (think void*). The compiler will determine if casts are safe via the type information and for casts it can't determine at build time, it will insert code to check the cast using the type information.
Functions would take parameters as ByVal or ByRef. For example DoSomething(ByRef ref uint8 val, uint8 val2, uint8 arr). The first parameter is passing by reference a reference to a uint8 (think double pointer). Assigning to val assigns to the reference. The second parameter is passed by value. The third parameter (array type) is passed by reference implicitly.
- Other Notes
This is not an exhaustive list of all features I am thinking of. For example visibility modifiers - public, private, module for variables, constants, and functions. Additionally, things could have attributes like in C# to tell the compiler what to do with a function or structure. For example, a structure or field could have a volatile attribute.
I want integration into the language for inline assembly for the architecture. So you could place a function attribute like [Assembly(armv7)]. This could tell the compiler that the function is all armv7 assembly and the compiler will verify it. Having assembly integrated also allows all the language features to be available to the assembly like constants. Does this go against having an IR representation of the library? No. functions have weak or strong linkage. Additionally, there could be a function attribute to tell the compiler: "Hey when the link stage is using an armv7 target, build this function in". There could also be a mechanism for inline assembly and intrinsics.
Please keep in mind that my hope is not to see another C systems language for larger systems (desktop, phones, laptops, etc.) Its solely to see it for small embedded systems and microcontrollers. I think this is why many of the newer languages (Go, Nim, Zig, etc..) have not been adopted in embedded - they started large and certain things were tacked on to "maybe" support smaller devices. I also don't want to have a runtime with my embedded microcontroller; however, I am not averse to the compiler putting bounds checks and casting checks into the assembly when it needs to. For example, if a cast fails, the compiler could just trap in a "hook" defined by the user that includes the module and line number of where the cast failed. It doesn't even matter that the system hangs or locks up as long as I know where to look to fix the bug. I can't tell you how many times something like this would be invaluable for debugging. In embedded, many of us say that its better for the system to crash hard than limp along because of an array out of bounds or whatever. Maybe it would be possible to restart the system in the event of such a crash or do "something" (like for a cruise missile :)).
This is intended to be a discussion and not so much a religious war or to state I am doing this or that. I just wanted to "blurt out" some stuff I have had on my mind for awhile.