Rawheds Tutorial#2:
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
The How to add two 16bit RGB565 pixels together nicely and also save lots of 
			registers in the process document :)

       by Rawhed(Andrew Griffiths)/Sensory Overload - 20 May 1999

                           andrew@overload.co.za
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Right, I'm assuming you've dealt with 16bit color before, so I won't explain the 
basics. A very cool thing you can do with 16bit color is layering. Yes, just 
like photoshop. Layering and transparency are very cool effects, but are 
actually very processor intensive as you have to separate the red, green and 
blue value from each pixel color. So for example, here is a standard 
transparency routine: 
    
	mov ax,[edi]    ;getpixel from buf1
	mov bx,[esi]    ;getpixel from buf2
	mov cx,ax       ;save them-- 
	mov dx,bx       ;save them--could also use stack I know...
	
	                ;now separate into RGB
	                ;RED
	shr cx,11
	shr dx,11
	add cx,dx
	shr cx,1        ;/2 for transparency
	shl cx,11
	push cx         ;runout of registers...so stack it
	
	                ;GREEN
	and cx,0000011111100000b
	and dx,0000011111100000b
	shr cx,5
	shr dx,5
	add cx,dx
	shr cx,1        ;/2 for transparency
	shl cx,5
	pop dx
	add cx,dx
	push cx         ;runout of registers...so stack it

			;BLUE
	and cx,0000000000011111b
	and dx,0000000000011111b
	add cx,dx
	shr cx,1        ;/2 for transparency
	pop dx
	add cx,dx

	mov [edi],cx    ;write the RGB pixel to the screen
                
Yes, I know its not the best implementation, but I think it shows how you have 
to split the RGB and deal with them separately. And you have to either push & 
pop, or you run out of registers. Its disgusting. You could tweak this to be a 
bit better, bit its still yuksei. I tried a better way. 

This way I'd been thinking of for a while(ever since I started RGB color 
coding), but never thought it would work for some reason. But I just tried it, 
and since it works I'm writing this document :)

It is cooler because it only uses 2 registers - EAX and EBX. Nothing else, and 
no stack. Its also cleaner and uses 32bit for some code. Freeing up registers is 
important because then you can optimise the rest of your inner-loop A LOT. 
Basically I thought that when you go: 
    
	mov ax,[edi]
	mov bx,[esi]
                
Then why couldn't you just go: 
    
	add ax,bx
                
And that would add the pixel colours together. Well, it doesn't work because 
overflows occur and the blue might seep into the green, or the green seep into 
the red. So you DO have to work with the R G B seperatly. So here is what I 
thought of: 

If you have the pixel1 color in AX, then EAX would contain zeros and then the 
data. If you have the pixel2 color in BX, then EBX would contain zeros and then 
the data. So if you could somehow use that zero area in the registers as a 
buffer zone to spread the RGB values out across, you could add the colors 
together without the red, green or blue interfeering with each other. 
Understand? hehe. 

Ok, here is EAX, and EBX right after you read the pixels into them: 
    
	fedcba9876543210fedcba9876543210
	0000000000000000RRRRRGGGGGGBBBBB       <---source pixel1 EAX
	0000000000000000RRRRRGGGGGGBBBBB       <---source pixel2 EBX
                
        
Now, if you could get it to look like this: 
    
	fedcba9876543210fedcba9876543210
	000RRRRR00GGGGGG000BBBBB00000000       <---rearranged source pixel1 EAX
	000RRRRR00GGGGGG000BBBBB00000000       <---rearranged source pixel2 EBX
                
        
	Then you could just go: 
    
	add eax,ebx
                
And there would be no problems. Cool eh? :)
But how do you get it to be rearranged? And then surly once you've added the 2 
together, you have to rearrange it back to the standard RGB format? Yes, yes 
yes. 

For EAX and EBX we have to perform a transformation on them. Luckily for you 
I've already had to figure out the transformation(and it was fun), and here it is: 
    
	fedcba9876543210fedcba9876543210
	0000000000000000RRRRRGGGGGGBBBBB       --original data
	00000RRRRRGGGGGGBBBBB00000000000       --rol eax,11 ;step1
	00000RRRRRGGGGGG00000000000BBBBB       --shr ax,11  ;step2
	00000000000BBBBB00000RRRRRGGGGGG       --ror eax,16 ;step3
	00000000000BBBBB000RRRRRGGGGGG00       --shl ax,2   ;step4
	00000000000BBBBB000RRRRR00GGGGGG       --shr al,2   ;step5
	000RRRRR00GGGGGG00000000000BBBBB       --rol eax,16 ;step6
	000RRRRR00GGGGGG000BBBBB00000000       --shl ax,8   ;step7
                
	and same thing for EBX: 
    
	fedcba9876543210fedcba9876543210
	0000000000000000RRRRRGGGGGGBBBBB       --original data
	00000RRRRRGGGGGGBBBBB00000000000       --rol ebx,11 ;step1
	00000RRRRRGGGGGG00000000000BBBBB       --shr bx,11  ;step2
	00000000000BBBBB00000RRRRRGGGGGG       --ror ebx,16 ;step3
	00000000000BBBBB000RRRRRGGGGGG00       --shl bx,2   ;step4
	00000000000BBBBB000RRRRR00GGGGGG       --shr bl,2   ;step5
	000RRRRR00GGGGGG00000000000BBBBB       --rol ebx,16 ;step6
	000RRRRR00GGGGGG000BBBBB00000000       --shl bx,8   ;step7
                
Ok, cool so now we have 2 dwords, ready to add. And they won't overflow :) So 
you can either add them and then divide by 2 for transparency, or you can do a 
cool thing and add them and clip their maximum range. Like: 
    
	r=r1+r2;
	g=g1+g2;
	b=b1+b2;
	if (r>31) r=31;
	if (g>63) g=63;
	if (b>31) b=31;

	                    ;must clip to 31, 63, 31:
	   ror eax,8        ;---> 0000000000RRRRRR0GGGGGGG00BBBBBB
	   cmp al,31
	    jle @blue_is_cool
	    mov al,31
	    @blue_is_cool
	
	   ror eax,8        ;---> 00BBBBBB0000000000RRRRRR0GGGGGGG
	   cmp al,63
	    jle @green_is_cool
	    mov al,63
	    @green_is_cool

	   ror eax,8        ;---> 0GGGGGGG00BBBBBB0000000000RRRRRR
	   cmp al,31
	    jle @red_is_cool
	    mov al,31
	    @red_is_cool
                
cool, so now we have: 
    
	000RRRRR00GGGGGG000BBBBB00000000         ;in EAX
                        
And we need to convert it back to normal RGB format. Which is just as easy as: 
    
	fedcba9876543210fedcba9876543210
	000BBBBB00000000000RRRRR00GGGGGG    --- rol eax,8      ;1
	000BBBBB00000000000RRRRRGGGGGG00    --- shl al,2       ;2
	00000000000RRRRRGGGGGG00000BBBBB    --- rol eax,8      ;3
	00000000000RRRRRGGGGGG00BBBBB000    --- shl al,3       ;4
	0000000000000RRRRRGGGGGG00BBBBB0    --- ror eax,2      ;5
	0000000000000RRRRRGGGGGGBBBBB000    --- shl al,2       ;6
	0000000000000000RRRRRGGGGGGBBBBB    --- shr eax,3      ;7
                
and now we just go: 
    
	mov [edi],ax        ;putpixel
                
So here is a complete example: 
    
            xor eax,eax
            xor ebx,ebx
            mov ax,[esi]      	;getpixel spritemap
            mov bx,[edi]      	;getpixel target for transparency layering

	    ;--------------------------------------

            rol eax,11 ;1	;conversion 1
            shr ax,11  ;2
            ror eax,16 ;3
            shl ax,2   ;4
            shr al,2   ;5
            rol eax,16 ;6
            shl ax,8   ;7

            rol ebx,11 ;1	;conversion 2
            shr bx,11  ;2
            ror ebx,16 ;3
            shl bx,2   ;4
            shr bl,2   ;5
            rol ebx,16 ;6
            shl bx,8   ;7

	    ;--------------------------------------

            add eax,ebx		;add them together!!!

	    ;--------------------------------------

	   ror eax,8       		
	   cmp al,31
	    jle @blue_is_cool		;check if overflow - blue?
	    mov al,31
	    @blue_is_cool:

	   ror eax,8  
	   cmp al,63
	    jle @green_is_cool		;check if overflow - green?
	    mov al,63
	    @green_is_cool:

	   ror eax,8   
	   cmp al,31
	    jle @red_is_cool		;check if overflow - red?
	    mov al,31
	    @red_is_cool:

	    ;--------------------------------------

	    rol eax,8      ;1		;convertback
	    shl al,2       ;2
	    rol eax,8      ;3
	    shl al,3       ;4
	    ror eax,2      ;5
	    shl al,2       ;6
	    shr eax,3      ;7

	    ;--------------------------------------
     
	    mov [edi],ax     		 ;putpixel   
                
Its very speedy, very slick, and it frees up registers. I found that using this 
technique I could have everything else out of the inner loop, and I just had 
like 3 more instructions in the inner-loop. Anyways, lemme know what you think. 
Perhaps this is oldhat :) 

-Rawhed/Sensory Overload
-Mailto:andrew@overload.co.za
-Htpp://www.overload.co.za
-Andrew Griffiths
-South Africa
-20-05-1999