Page 1 of 2

Utf8 converter is missing in HostTextConv.

Posted: Mon Apr 22, 2019 5:51 pm
by Zinn
In the past I work with Linux on my notebook and with Windows 10 on my desktop computer. Now I changed may desktop also to Linux Mint Cinnamon 19.1 Tessa.
Now I realised that Windows log and text files still using old code page format but Linux uses modern utf8 format.

We have in HostTextConv the converter for Text, RichText and Unicode but I miss the converter for Utf8.

Shouldn't we add the procedures ImportUtf8 & ExportUtf8 from CpcUtf8Conv to the HostTextConv?
What is your opinion?

- Helmut

Re: Utf8 converter is missing in HostTextConv.

Posted: Thu Apr 25, 2019 5:26 pm
by Ivan Denisov
Good initiative.

Re: Utf8 converter is missing in HostTextConv.

Posted: Fri Apr 26, 2019 5:36 am
by Ivan Denisov
Zinn wrote:Now I changed may desktop also to Linux Mint Cinnamon 19.1 Tessa.
Also installed it yesterday :)

After BIOS update in Windows destroy the order of my UEFI records from which Debian usually starting, I decided to try Mint, because it uses my favorite Cinnamon desktop (I used it in Debian) and at the same time it is based on Ubuntu with good support of CUDA and Nvidia Drivers (need for molecular modeling).

I have checked, that BlackBox for Linux is working well if you use this commands set:

Code: Select all

wget http://dev.obertone.ru/dev.obertone.ru.gpg.key
sudo apt-key add dev.obertone.ru.gpg.key
sudo dpkg --add-architecture i386
echo "deb [arch=i386] http://dev.obertone.ru/bionic testing main" | sudo tee -a /etc/apt/sources.list
sudo apt-get update
sudo apt-get install bbcb

Re: Utf8 converter is missing in HostTextConv.

Posted: Fri Apr 26, 2019 9:23 pm
by Robert
Zinn wrote:Shouldn't we add the procedures ImportUtf8 & ExportUtf8 from CpcUtf8Conv to the HostTextConv?
What is your opinion?
Seems like a reasonable idea to me.

Are the existing converters documented anywhere - should they be?

Re: Utf8 converter is missing in HostTextConv.

Posted: Sat Apr 27, 2019 10:02 am
by Josef Templ
adding utf8 converters is a good idea for me.
What would be the file name extension? Is there any standard for that?

- Josef

Re: Utf8 converter is missing in HostTextConv.

Posted: Sat Apr 27, 2019 12:04 pm
by Zinn
Josef Templ wrote:What would be the file name extension? Is there any standard for that?
In Windows I don't know any standard extension.
In Linux I often use it with extension .sh (bash script).
Original the the Importer / Exporter was used by Romiras with extension .cp

Further in Windows there is CR and LF at end of line
and in Linux LF only at end of line

- Helmut

Re: Utf8 converter is missing in HostTextConv.

Posted: Tue Apr 30, 2019 7:21 am
by Zinn
I prefer to select all files, because all txt and log files in Linux are using Utf-8
- Helmut

Re: Utf8 converter is missing in HostTextConv.

Posted: Wed May 08, 2019 7:14 pm
by Josef Templ
.utf8 seems to be the 'official' extension for UTF-8 text files.
In addition, there could be the Converters.importAll flag set for such a converter.
I don't know exactly what it changes, though.
For decoding arbitrary binary files, however, this may lead to utf-8 decoding errors and the
question is how to handle them, e.g. output a question mark or similar.

When reading, CR-LF and LF can both be treated equally.
When writing, it may be a good idea to adapt to the platform (Win or Wine)

- Josef

Re: Utf8 converter is missing in HostTextConv.

Posted: Sat May 11, 2019 5:55 pm
by Zinn
Hello Josef,
the module at http://www.zinnamturm.eu/downloadsAC.htm#CpcUtf8Conv worked as you describe during reading CR-LF and LF.
The Writing is adapted to the Wine platform. But it does not check any errors. The part about SetFont should be deleted.
Of course the extension can changed to .utf8
Helmut

Re: Utf8 converter is missing in HostTextConv.

Posted: Mon Jul 01, 2019 6:28 am
by Zinn
Here there are my change porposal to this topic

Host/Mod/TextConv.odc
(1)

Code: Select all

	PROCEDURE ReadChar (VAR rd: Stores.Reader; VAR ch: CHAR);
		VAR
			c1, c2, c3: BYTE;
	BEGIN	(* UTF-8 format *)
		rd.ReadByte(c1);
		ch := CHR(c1);
		IF c1 < 0 THEN (* c1 < 0 &  c1 > -64 = C0 = 110x xxxx *)
			rd.ReadByte(c2);
			ch := CHR(64 * (c1 MOD 32) + (c2 MOD 64));
			IF c1 >=  - 32 THEN (* c1 < 0 & c1 >= -32 = E0 = 1110 xxxxx *)
				rd.ReadByte(c3);
				ch := CHR(4096 * (c1 MOD 16) + 64 * (c2 MOD 64) + (c3 MOD 64));
			END;
		END;
	END ReadChar;

	PROCEDURE ImportUtf8* (f: Files.File; OUT s: Stores.Store);
		VAR r: Stores.Reader; t: TextModels.Model; wr: TextModels.Writer; ch, nch: CHAR;
	BEGIN
		ASSERT(f # NIL, 20);
		r.ConnectTo(f); r.SetPos(0);
		t := TextModels.dir.New(); wr := t.NewWriter(NIL);
		ReadChar(r, ch);
		WHILE ~r.rider.eof DO
			ReadChar(r, nch);
			IF (ch = CR) & (nch = LF) THEN ReadChar(r, nch)
			ELSIF ch = LF THEN ch := CR
			END;
			wr.WriteChar(ch);
			ch := nch;
		END;
		s := TextViews.dir.New(t)
	END ImportUtf8;
Host/Mod/TextConv.odc
(2)

Code: Select all

	PROCEDURE WriteChar (VAR wr: Stores.Writer; ch: CHAR);
	BEGIN	(* UTF-8 format *)
		IF ch <= 7FX THEN
			wr.WriteByte(SHORT(SHORT(ORD(ch))))
		ELSIF ch <= 7FFX THEN
			wr.WriteByte(SHORT(SHORT( - 64 + ORD(ch) DIV 64)));
			wr.WriteByte(SHORT(SHORT( - 128 + ORD(ch) MOD 64)))
		ELSE
			wr.WriteByte(SHORT(SHORT( - 32 + ORD(ch) DIV 4096)));
			wr.WriteByte(SHORT(SHORT( - 128 + ORD(ch) DIV 64 MOD 64)));
			wr.WriteByte(SHORT(SHORT( - 128 + ORD(ch) MOD 64)))
		END
	END WriteChar;

	PROCEDURE ExportUtf8* (s: Stores.Store; f: Files.File);
		VAR w: Stores.Writer; t: TextModels.Model; r: TextModels.Reader; ch: CHAR;
	BEGIN
		ASSERT(s # NIL, 20); ASSERT(f # NIL, 21);
		s := TextView(s);
		IF s # NIL THEN
			w.ConnectTo(f); w.SetPos(0);
			t := s(TextViews.View).ThisModel();
			IF t # NIL THEN
				r := t.NewReader(NIL);
				r.ReadChar(ch);
				WHILE ~r.eot DO
					IF ch = CR THEN WriteChar(w, LF) ELSE WriteChar(w, ch) END;
					r.ReadChar(ch)
				END
			END
		END
	END ExportUtf8;
System/Mod/Config.odc

Code: Select all

	Converters.Register("HostTextConv.ImportUtf8", "HostTextConv.ExportUtf8", "TextViews.View", "utf8", {Converters.importAll});
	Converters.Register("HostTextConv.ImportText", "HostTextConv.ExportText", "TextViews.View", "txt", {});
Host/Rsrc/Strings.odc

Code: Select all

HostTextConv.ImportUtf8	Linux Text
HostTextConv.ImportText	Window Text
...
HostTextConv.ExportUtf8	Linux Plain Text
HostTextConv.ExportText	Window Plain Text
This changes works under Wine on my Linux system.
- Helmut