Page 1 of 1

Subsystem Xml available on CPC

PostPosted: Mon Apr 30, 2018 7:57 am
by Josef Templ
I would like to announce the availability of a new subsystem called Xml on CPC.

This subsystem provides XML, HTML, and CSS parsers (and scanners) and it provides
an HTML importer that can import 'HTML Format' clipboard data.

Since RTF (rich text format) seems to be no longer supported by major browsers (was it ever supported?),
this subsystem could be used for transferring formatted HTML data from the clipboard to BlackBox.

Technically, the HTML importer uses the HTML parser for translating the clipboard HTML data into a DOM structure in memory.
Then, the HTML importer converts DOM to formatted BlackBox text, as good as possible. Of course, not all
HTML constructs can be converted perfectly. Whenever the importer sees a so-called inline style attribute,
it uses the CSS parser to parse the style attribute. Known CSS style properties are then converted to the appropriate
text attributes and ruler settings. Importing is reasonably fast, about 10 times faster than in MS Word 2013.

In order to avoid excessive string object creation (on the heap) the HTML importer uses string pooling
for both the HTMl and the CSS parsers.
String pooling means that strings that occur multiple times are represented by a single shared object.
It is an optional feature of all included parsers.

- Josef

Re: Subsystem W3c available on CPC

PostPosted: Mon May 07, 2018 7:03 am
by Josef Templ
The subsystem Xml now also provides JSON support
and it has been renamed from Xml to W3c.

Dowbload from

- Josef

Re: Subsystem Xml available on CPC

PostPosted: Wed May 09, 2018 12:21 pm
by luowy
Hi Josef,
I found a problem: It traps when pasting a html fragment ;
you can copy and paste the following html to reproduce this bug: <=>


Re: Subsystem Xml available on CPC

PostPosted: Thu May 10, 2018 2:44 am
by luowy
It has been solved.

Re: Subsystem Xml available on CPC

PostPosted: Thu May 10, 2018 7:29 am
by Josef Templ
I just noticed that the formatting of the imported page ( could be improved a bit.
In the header section it has a list of buttons, which currently appear one button per line in BlackBox.
The reason is that the HtmlImporter does not look at the 'display' property when
converting list items (<li>). In this case the display property is 'inline', whereas the importer
assumes the default 'block', which means one per line.
In a future version it will be fixed, probably together with further formatting improvements.

- Josef