Google has laid out html5 parser in pure С
Gumbo is a pure C99 implementation of HTML5 parsing algorithms that does not require any additional libraries. It was created to enable third-party developers to create their own applications and utilities such as validators, refactoring and code analysis tools.
Full compliance with HTML5 specification.
Resilience to invalid input data.
Simple APIs that can be called from other languages.
Passes all html5lib-0.95 tests.
Checked on over 2.5 billion pages from Google Index.
Only UTF-8 encoding is supported.
No support for C89.
To be added in the future:
Support for the latest changes to the HTML5 specification.
Parsin support for code snippet only.
Full support for parsing errors.
Binding to other languages.
To use the parser, you need to connect gumbo.h
w3c is already working on the next version of the html
Recently, the W3C published an interesting document which invites you to discuss what’s new in the next version of HTML (number 6?). Those interested can familiarize themselves with them in more detail, and we will just go over the list a little.
So, the W3C proposes to introduce several new semantic tags: Location (To indicate the location of something. It is proposed to enter attributes such as latitude and longitude in this tag), datagrid (Yes, there are already tables in html, but it would be cool to see something something like this in the html spec!), Teaser, Editor and others.
It is proposed to introduce several new features into the Web Forms standard. It is planned to add new APIs such as Adaptive Streaming and Video Metrics.
There is also a proposal to make some changes to the Copy-Paste system. For example, when copying numbered lists other than the first element, it should be possible to preserve the numbering. For example, when copying a list <ol> <li> First <li> Second <li> Third <li> Fourth <li> Fifth </ol>
starting from the third element – display the result
not like now:
which will insert the code from an external file.
The changes will also affect the <code> tag, which will now support code highlighting.
Of course, it is not yet known whether all these changes will take effect, but they would all be very