Strings concatenation optimisation

User avatar
Ivan Denisov
Posts: 362
Joined: Tue Sep 17, 2013 12:21 am
Location: Krasnoyarsk, Russia

Strings concatenation optimisation

Post by Ivan Denisov »

Please take a look for the optimization feature suggested by Oleg.
https://forum.oberoncore.ru/viewtopic.p ... 11#p108850
Josef Templ
Posts: 262
Joined: Tue Sep 17, 2013 6:50 am

Re: Strings concatenation optimisation

Post by Josef Templ »

I cannot see the problem with statements like s := s + "xxx" in BB.
Can you give an example test case that fails in BB?

- Josef
luowy
Posts: 87
Joined: Thu Dec 17, 2015 1:32 pm

Re: Strings concatenation optimisation

Post by luowy »

the output code is correct, Oleg complain the qulity of the output code:
the

Code: Select all

 a :=a +'str';
output

Code: Select all

a :=a; a+="str";
instead of

Code: Select all

 a.append('str');
copy(write) itself is not necessary and inefficient, get(read) it's length is better;

but I think the code generater is good enough,look the main code of a:=a+'str';

Code: Select all

LEA	EDI, [EBP-200]	
MOV	ECX, 100	
;===========================
LEA	ESI, [EBP-200]	
LODSW		
OR	AX, AX	
JZ	7 (66830099H)	
STOSW	               ; add di,2	
DEC	ECX	
JNZ	-12 (6683008BH)	
TRAP	8	; copyTrap
;============================
MOV	ESI, 66830068H	
LODSW		
STOSW	

the optmization what I can do is:

Code: Select all

 add di,2;  STOSW	  	
Is it worth it? need change the OPV OPC OPL three file;
the string plus only suitable for short string, large one will cause stack overflow;

Code: Select all

	PROCEDURE Do* ();
		VAR s: POINTER TO ARRAY OF CHAR;  
	BEGIN
		NEW(s, 1024 * 1024 * 2);
		s^ := s^ + 'a';   // ok 
		s^ := 'a' + s^;  (*stack ovreflow!! *)
	END Do;
the code use stack buffer when needed;

the optmization is valuable for large string operation, untill now we only use it for short ones;
Josef Templ
Posts: 262
Joined: Tue Sep 17, 2013 6:50 am

Re: Strings concatenation optimisation

Post by Josef Templ »

> the output code is correct, Oleg complain the qulity of the output code:

Thanks for the clarification.

In general, when optimizing statements like
s := s + ... ;
one must be careful about detecting the correct pattern.
The following statement may fail if there is no buffer allocated for the resulting string.
s := s + "xxx" + s;

I hope that Oleg checked for this.

- Josef
Zorko
Posts: 26
Joined: Tue Sep 17, 2013 10:13 pm
Location: Ukraine
Contact:

Re: Strings concatenation optimisation

Post by Zorko »

Now the operation of the form

Code: Select all

s := s$ + "something"
implemented as follows: the string s is copied into itself in a loop, which also searches for the terminating 0X and also checks the string (buffer) length, for generating trap when out of the string. Basic copying is done using instructions lodsw:stosw (for CHAR) and lodsb:stosb (for SHORTCHAR).

It is obvious that the optimization of this solution may not be strong, so I would have just suggest to remove copying. Let only stays search for 0X with a checking for out-of-string case. How exactly to implement it better - I have not researched. The i386 processor doesn't have a stand-alone command for searching 0X in a memory, as far as I know. Therefore, the search cycle is still inevitable.
Zorko
Posts: 26
Joined: Tue Sep 17, 2013 10:13 pm
Location: Ukraine
Contact:

Re: Strings concatenation optimisation

Post by Zorko »

What about

Code: Select all

s := s + "xxx" + s;
, such operation is now decomposed into three operations:

Code: Select all

s := s; s += "xxx"; s += s;
And the buffer overflow check happens in every operation here. And the search for the final 0X occurs here even twice.
luowy
Posts: 87
Joined: Thu Dec 17, 2015 1:32 pm

Re: Strings concatenation optimisation

Post by luowy »

Rethink Oleg's question, the strings operation need some improvement;

1, the form: a:=a+'str'; (no buffer needed)
old : a :=a; a += 'str';
new: a := (scan 0X); a+="str";
improve point: read a instead copy a;

2,the form: a := b + a; (need buffer)
old: a' =b; a'+= a; a:=a';
new: a':=a; a:=b; a+=a';
improve point: use a' as src buffer, short the copy length;

3,do we need restrict the buffer length ?
threshold =4096;
if LEN(buf)>threshold AllocHeap() or error(I preffer)
else AllocStack()
end;

luowy
Josef Templ
Posts: 262
Joined: Tue Sep 17, 2013 6:50 am

Re: Strings concatenation optimisation

Post by Josef Templ »

Zorko wrote:What about

Code: Select all

s := s + "xxx" + s;
, such operation is now decomposed into three operations:

Code: Select all

s := s; s += "xxx"; s += s;
And the buffer overflow check happens in every operation here. And the search for the final 0X occurs here even twice.
If s = "" at the beginning, then s := s + "xxx" + s; should set s to "xxx", right?
With the proposed decomposed form you would get "xxxxxx", which is wrong.

- Josef
Zorko
Posts: 26
Joined: Tue Sep 17, 2013 10:13 pm
Location: Ukraine
Contact:

Re: Strings concatenation optimisation

Post by Zorko »

luowy is right. I like that he is open to new ideas.

Josef has a point too.

Please, answer the question: do we have a technical possibility at the time of constructing the concatenation operation of the form s := s + "a" + s to find out whether the string s is present in the chain of added strings? If the operation does not contain the source string to concatenation, then we do not need a buffer at all. We use this string as a buffer, thus saving and optimizing the concatenation.

Code: Select all

s := s + "a" + b + c; (* s can be used as buffer *)
s := "a" + b + c + s; (* s can not be used as buffer *)
So our task is to find: can a string be the buffer or a separate buffer is required. I'm going to implement this strategy in Ofront+

As far as I understand, now BlackBox always uses the buffer.
luowy
Posts: 87
Joined: Thu Dec 17, 2015 1:32 pm

Re: Strings concatenation optimisation

Post by luowy »

Zorko wrote: As far as I understand, now BlackBox always uses the buffer.
no! the current compiler has already done this optimization, check my previous post (form 1); or you can check it by decoder(e.g. open code file)
Post Reply