Subsystem Xml available on CPC
Posted: Mon Apr 30, 2018 7:57 am
I would like to announce the availability of a new subsystem called Xml on CPC.
See http://www.zinnamturm.eu/downloadsTZ.htm#Xml.
This subsystem provides XML, HTML, and CSS parsers (and scanners) and it provides
an HTML importer that can import 'HTML Format' clipboard data.
Since RTF (rich text format) seems to be no longer supported by major browsers (was it ever supported?),
this subsystem could be used for transferring formatted HTML data from the clipboard to BlackBox.
Technically, the HTML importer uses the HTML parser for translating the clipboard HTML data into a DOM structure in memory.
Then, the HTML importer converts DOM to formatted BlackBox text, as good as possible. Of course, not all
HTML constructs can be converted perfectly. Whenever the importer sees a so-called inline style attribute,
it uses the CSS parser to parse the style attribute. Known CSS style properties are then converted to the appropriate
text attributes and ruler settings. Importing is reasonably fast, about 10 times faster than in MS Word 2013.
In order to avoid excessive string object creation (on the heap) the HTML importer uses string pooling
for both the HTMl and the CSS parsers.
String pooling means that strings that occur multiple times are represented by a single shared object.
It is an optional feature of all included parsers.
- Josef
See http://www.zinnamturm.eu/downloadsTZ.htm#Xml.
This subsystem provides XML, HTML, and CSS parsers (and scanners) and it provides
an HTML importer that can import 'HTML Format' clipboard data.
Since RTF (rich text format) seems to be no longer supported by major browsers (was it ever supported?),
this subsystem could be used for transferring formatted HTML data from the clipboard to BlackBox.
Technically, the HTML importer uses the HTML parser for translating the clipboard HTML data into a DOM structure in memory.
Then, the HTML importer converts DOM to formatted BlackBox text, as good as possible. Of course, not all
HTML constructs can be converted perfectly. Whenever the importer sees a so-called inline style attribute,
it uses the CSS parser to parse the style attribute. Known CSS style properties are then converted to the appropriate
text attributes and ruler settings. Importing is reasonably fast, about 10 times faster than in MS Word 2013.
In order to avoid excessive string object creation (on the heap) the HTML importer uses string pooling
for both the HTMl and the CSS parsers.
String pooling means that strings that occur multiple times are represented by a single shared object.
It is an optional feature of all included parsers.
- Josef