Table of Contents
A Smart Way To Detect And Sort Code Snippets Anywhere
In this blog post, I’m going to give some power tips about how to improve your game when building a doc site, web app, or chatbot that deals with lots of code snippets, code editing, syntax highlighting, and/or documentation.
My code snippets are all over…
Like water, code snippets thrive everywhere; in developer docs, messengers like Slack or Teams, productivity tools like Notion or Google Docs. And if I may say so myself, developers are often lazy, so they’re even sent via email sometimes. This is not just an annoyance for the recipient; it’s a missed opportunity for the sender.
You see, code snippets can be hard to read when they’re not formatted correctly. Tags get lost and syntax gets messed up—not only does it look bad, but it makes the snippet impossible to understand without double-checking every single line. Code is meant to be read by machines as well as humans, and that means preserving formatting and syntax highlighting across different apps and devices.
The problem is that a lot of the apps we use aren’t built to handle this code effectively, so the burden of ensuring proper formatting falls on the sender. And to be honest? Developers are usually too lazy to do that. OKAY, I SAID IT AGAIN.
We’d rather copy-paste something quickly than bother with complicated workarounds, so we end up sending a lot of unformatted code snippets—which means recipients have to take extra time out of their days to fix them.
There should be a better way…
There are a lot of tools out there that help with code snippet syntax, but you have to know what you’re looking for. In this post, I’ll take you through some of the best options available today.
Can the Classical help us? — Highlight JS, Prism JS, and Code Mirror
The first time I saw Highlight JS at work, I was amazed – it was like being in the part of the movie where the nerdy kid becomes cool all of a sudden.
I was working for a university at the time, building a messaging platform that had functionality for copy-pasting code snippets into messages. And I thought, “Hey, what if the system could automatically detect if this is code or natural language? And not only that but what if it could also determine whether it’s Kotlin, Typescript, or Dart? The system should then be able to format and syntax highlight it accordingly.”
That would be an amazing experience for users of my messaging platform. Imagine how much better blogs and forums would be with that kind of functionality!
Unfortunately, both the code and technical language generated by our users were unstructured and unlabeled, so they often weren’t formatted correctly or highlighted using their correct colors. We were distraught!
So what could we do? Well, first we looked at Highlight JS—it has automatic language detection and multi-language code highlighting, so that seemed like a good place to start. But alas! There were still problems.
You see, the problem with Highlights is that;
- It can’t tell you whether the user input is code or natural language. This turns out to be quite difficult to solve (spoiler alert: we found an easy solution. I’d show you below ).
- On top of that, if you are passing in code, you have to specify the language before HighlightJS will highlight it correctly.
To be fair, Highlight JS does attempt to figure out what coding language has been entered by the user, but this can take up significant network and computing resources.
Highlight JS offers some pretty sweet features, but they come at a cost. For instance, you go to the Highlight JS website and download the source for their default 34 languages. Next thing you know, you add ~1.6 Megabytes to your bundle size. Now, I know what you’re thinking: “But this is minified!” And that’s true. But then we have real problems when we add full language support to our bundle size: that clocks in at ~2.5 Megabytes!
But here’s the thing: It doesn’t stop there. As we all know, with every action comes a reaction — and in this case, it’s not just any reaction; it’s one of slowness. I’m talking about 300 to ~5600 milliseconds of delay time, depending on your network speed when you check Highlight JS’s bundle phobia.
And if we’re being conservative about load time and say it takes 1500 milliseconds, you’re already negatively impacting your users’ experience—which is obviously not good.
Like Mark said, here is a witty one-liner on the graphics.
Now let’s say your users are willing to wait that long—and maybe they’ve gotten a beer and some chips too—and Highlight JS finally loads. It’s going to immediately start wreaking havoc on your users’ experience.
Highlight JS is using regex under the hood, which can easily freeze the user’s window or drop frames below the target 60fps. Long story short — props to Highlight JS for attempting to solve the Problem around Language Detection, but we can do better.
So what can we do?
As skilled web developers, the answer seems simple enough: Throw out Highlight JS and develop our highlighting library! After all, how hard could it be?
How about a New Stack: Prism JS + Code Detection API
After exploring Highlight JS and finding its limitations, my team at Pieces created the Code Detection API, which leverages machine learning to overcome Highlight JS’s limitations and improve app performance.
Code Detection API is an API that uses machine learning to detect code on web pages and other documents. It is available through our platform at runtime .dev and via apilayer.com.
Code Detection API is pretty much the best thing to happen to syntax highlighting since sliced bread. And, it happens to be sliced bread for a lot of reasons, including (but not limited to):
- It’s gloriously simple and performant.
- It can determine whether a given string/input is a natural language or a code/technical language.
- If it detects code, it will classify it with a high level of accuracy in up to 30 programming languages.
- And, it does all of this in approximately 250 milliseconds while minimizing the requirements of on-device computing.
Code Detection API does one thing and it does it great: detects code. It’s just as easy as grabbing the package, initializing it with your API key, and calling the detect function on any text you want to evaluate.
I’ve been looking for something like this for a while. I needed it so that I could start looking at other options for syntax highlighting—which is where Prism JS comes into play.
If you’re looking for a code syntax highlighter for your website, Prism JS is worth a look. It’s got a very small core—only about 2 kilobytes—which means it takes up a lot less space in your browser’s memory than Highlight JS does. And if you know what language your users are going to be highlighting, you can use its Autoloader Plugin to import stylesheets that are specific to just that language! That should help to make your site load even faster.
Let’s take a look at Prism JS’s Bundle phobia, where we can see how quickly different plugins load. It looks like Prism JS clocks in at significantly faster load times than Highlight JS does!
But there’s more! When you use Prism JS with Code Detection API, the result is a significantly improved experience for both your users and you. Your users will benefit from the faster performance and a better experience where the code they use is treated like… well, code! And with beautiful syntax highlighting to boot!
When you’re building a product, there are times when you not only want your code to be highlighted for better readability but also editable, giving the user an interactive experience.
With Highlight JS, you can use a technique where you load specific dependencies without the Autodetection features and supplement those with Code Detection API, similar to what I do with Prism JS.
More interesting, when paired with CodeMirror6, Code Detection API detects the language you are writing and sets the correct syntax highlighting for you. No dropdowns or selections are necessary! Many of our users prefer this setup because they can simply begin writing code without any configuration.
Here is the icing on the cake
You can create a better user experience with Code Detection API!
At Runtime. dev, we believe that the work you do is important. Whether you’re building a chatbot, a documentation site, or a developer productivity tool like code. pieces. app, we think it’s important that your users have a good experience and don’t run into any bugs. And with our new Code Detection API, we can help you achieve greater UX in all of your projects.
The Code Detection API and products like Prism JS or Code Mirror 6 are mega-steps in the right direction for making this possible.
Let’s Wrap it Up! What next?
Check out Pieces—it’s an app that uses Code Detection API in production.
Pieces also do some cool work with Technical Language Processing, NLP, and OCR.
The Code Detection API is the first API for runtime. dev. In the next few weeks, more APIs will arrive—auto-tag generation from a code snippet and the extraction of code snippets from images.
When these capabilities land on runtime. dev, they’re going to make it easier than ever for app developers to improve Search, SEO, and Accessibility. When you can automatically generate meta-tags for Search and Alt-Text, it changes the game for SEO and Accessibility. Also, being able to automatically extract code from images on a webpage is going to change how people use in-browser features like Find in Page or Copy and Paste.
I hope you found these insights useful and will consider updating your user experience and performance around code snippets!