Understanding 6502 assembly on the Commodore 64 - (14) Space and cycle optimization

With the C64 and other 8 bit computers, limited in speed and storage space, it important to optimize code to be a small and as fast as possible.  While our simple binary conversion requires neither, its good to see how much we can reduce the size of the code and reduce the amount of cycles required to execute it.  I did not include every single possible optimization the world has to offer, so do not contact me with your idea.  This is primarily about optimizing there flow of the program, and not about   optimization through ways which beginning 6502 programmers might be confused.  


You are however welcome to share you thoughts or techniques below.  I will remind people that contacting someone online does not preclude you from having good manners.  Interact with people as you would interact with a stranger in person.








This is our program, as it was, I've left the NOP in it as to not change our initial results from chapter 13, also we will have to replace RTS with BRK to accurately count cycles on virtual 6502.


; C64 Hex to Binary display converter
; 64TASS assembler style code for 6502
; Jordan Rubin 2014 http://technocoma.blogspot.com
;
; Takes the HEX value in OURHEXVAUE, converts it to Binary for display 
; on the screen as a binary number.   
   
*=$C000 ; SYS 49152 to begin

OURHEXVALUE = #$55 ; Enter the Hex value to be converted here

OURHEXNUM = $033C  ; This is where the constant OURHEXVALUE will be stored
TESTBYTE = $0345   ; This is where our test byte will be stored for lsr
BIT7 = $0708       ; This is the location of the 7th bit, required room for
                  ; 8 contiguous bytes after the starting address
                 ; using 0708 dumps it right to screen ram, bottom center

nop


INIT:

lda OURHEXVALUE     ; this will be out test number 
sta OURHEXNUM       ; we will store the test number here perminantly       
ldy #$80            ; Out first bit test for bit 7 must be 10000000 $80 
sty TESTBYTE        ; store our initial test byte here
ldx #$00            ;   Initialize X for our loop

CONVERTION:

lda OURHEXNUM   ; load our test hex number, this is a constant
and TESTBYTE    ; mask it with our test byte
cmp #$00        ; is the result 00?
bne STORE1      ; No, jsr to STORE1
beq STORE0      ; Yes, jsr to STORE0
CONTINUE:
inx            ; Increment X for our loop
lda TESTBYTE   ; load testbyte into A                                                                                                                              
lsr            ; divide it by 2
sta TESTBYTE   ; store new testbyte back to its memory area
cpx #$08       ; is X=8?
bne CONVERTION ; No, LOOP back to CONVERSION
brk

STORE0:

lda #$30       ; Load the display value of 0 into A
sta BIT7,x     ; store A to the current storage memory location
jmp CONTINUE   ; jump to CONTINUE

STORE1:

lda #$31       ; Load the display value of 0 into A
sta BIT7,x     ; store A to the current storage memory location
jmp CONTINUE   ; jump to CONTINUE


Writing optimized code could be daunting, maybe its better to write working code.  Such as that above and then optimize it.

OUR current code, as shown occupies memory area C000 to C035. [54 bytes]
Executing it in virtual 6502 shows it required 349 Cycles to complete

ea a9 55 8d 3c 03 a0 80 8c 45 03 a2 00 ad 3c 03 2d 45 03 c9 00 d0 17 f0 0d e8 ad 45 03 4a 8d 45 03 e0 08 d0 e8 60 a9 30 9d 08 07 4c 19 c0 a9 31 9d 08 07 4c 19 c0






Lets see if we can improve upon this, its not a big program, and there isn't much to do, but there is enough to do..... remember we can save a byte and a cycle getting rid of NOP, but well keep it so we can view the code in the monitor, lets move to the real stuff

INIT:
      This is rather straight forward, and no waste,we'll leave it alone

STORE0: and STORE1:

     Both seem to have the instruction stay BIT7,x just before jumping to CONTINUE.  why not move that instruction into CONTINUE to the top line, both STORE0 and STORE1 will execute it anyway.   This won't save us any cycles, but it will save us some space

[OLD]

CONVERTION:
lda OURHEXNUM   ; load our test hex number, this is a constant
and TESTBYTE    ; mask it with our test byte
cmp #$00        ; is the result 00?
bne STORE1      ; No, jsr to STORE1
beq STORE0      ; Yes, jsr to STORE0

CONTINUE:

inx            ; Increment X for our loop
lda TESTBYTE   ; load testbyte into A                                                                                                                              
lsr            ; divide it by 2
sta TESTBYTE   ; store new testbyte back to its memory area
cpx #$08       ; is X=8?
bne CONVERTION ; No, LOOP back to CONVERSION
brk

STORE0:

lda #$30       ; Load the display value of 0 into A
sta BIT7,x     ; store A to the current storage memory location
jmp CONTINUE   ; jump to CONTINUE

STORE1:

lda #$31       ; Load the display value of 0 into A
sta BIT7,x     ; store A to the current storage memory location
jmp CONTINUE   ; jump to CONTINUE



[NEW]

CONVERTION:
lda OURHEXNUM   ; load our test hex number, this is a constant
and TESTBYTE    ; mask it with our test byte
cmp #$00        ; is the result 00?
bne STORE1      ; No, jsr to STORE1
beq STORE0      ; Yes, jsr to STORE0

CONTINUE:

sta BIT7,x     ; store A to the current storage memory location
inx            ; Increment X for our loop
lda TESTBYTE   ; load testbyte into A                                                                                                                              
lsr            ; divide it by 2
sta TESTBYTE   ; store new testbyte back to its memory area
cpx #$08       ; is X=8?
bne CONVERTION ; No, LOOP back to CONVERSION
brk

STORE0:

lda #$30       ; Load the display value of 0 into A
jmp CONTINUE   ; jump to CONTINUE

STORE1:

lda #$31       ; Load the display value of 0 into A
jmp CONTINUE   ; jump to CONTINUE



Now that this was done, lets look further at our code.  We can see in conversion that two possible branches exist STORE0 or STORE1

CONVERTION:
lda OURHEXNUM   ; load our test hex number, this is a constant
and TESTBYTE    ; mask it with our test byte
cmp #$00        ; is the result 00?
bne STORE1      ; No, jsr to STORE1
beq STORE0      ; Yes, jsr to STORE0


If there are only 2 possibilities it seems like a waste to jsr to both based on our tests when we can have a normal program flow and throw 1 exception

Why not keep our exception bne STORE1 and put our STORE0 code right below to just continue on.

Well have to move the code so that STORE0 is directly under CONVERSION

CONVERTION:
lda OURHEXNUM   ; load our test hex number, this is a constant
and TESTBYTE    ; mask it with our test byte
cmp #$00        ; is the result 00?
bne STORE1      ; No, jsr to STORE1
lda #$30       ; Load the display value of 0 into A
jmp CONTINUE   ; jump to CONTINUE

STORE1:

lda #$31       ; Load the display value of 1 into A

CONTINUE:

sta BIT7,x     ; Load the display value into A
inx            ; Increment X for our loop
lda TESTBYTE   ; load testbyte into A                                                                                                                              
lsr            ; divide it by 2
sta TESTBYTE   ; store new testbyte back to its memory area
cpx #$08       ; is X=8?
bne CONVERTION ; No, LOOP back to CONVERSION
brk

Essentially there is no more STORE0 function.  In our new program flow A will be $30 unless it branched to STORE1, which makes A $31.  Both CONVERSION and STORE1 ultimately lead to CONTINUE.

 Lets look at the final optimized program, we kept the no in for a fair comparison between the old and new code 

; C64 Hex to Binary display converter optimized
; 64TASS assembler style code for 6502
; Jordan Rubin 2014 http://technocoma.blogspot.com
;
; Takes the HEX value in OURHEXVAUE, converts it to Binary for display 
; on the screen as a binary number.   
   
*=$C000 ; SYS 49152 to begin

OURHEXVALUE = #$55 ; Enter the Hex value to be converted here

OURHEXNUM = $033C  ; This is where the constant OURHEXVALUE will be stored
TESTBYTE = $0345   ; This is where our test byte will be stored for lsr
BIT7 = $0708       ; This is the location of the 7th bit, required room for
               ; 8 contiguous bytes after the starting address
               ; using 0708 dumps it right to screen ram, bottom center

nop


INIT:

lda OURHEXVALUE ; this will be out test number 
sta OURHEXNUM   ; we will store the test number here permanently       
ldy #$80        ; Out first bit test for bit 7 must be 10000000 $80 
sty TESTBYTE    ; Store our initial test byte here
ldx #$00        ; Initialize X for our loop

CONVERTION:

lda OURHEXNUM   ; load our test hex number, this is a constant
and TESTBYTE    ; mask it with our test byte
cmp #$00        ; is the result 00?
bne STORE1      ; No, jsr to STORE1
lda #$30        ; Load the display value of 0 into A
jmp CONTINUE    ; jump to CONTINUE

STORE1:

lda #$31       ; Load the display value of 1 into A

CONTINUE:

sta BIT7,x     ; Load the display value into A
inx            ; Increment X for our loop
lda TESTBYTE   ; load testbyte into A                                                                                                                              
lsr            ; divide it by 2
sta TESTBYTE   ; store new testbyte back to its memory area
cpx #$08       ; is X=8?
bne CONVERTION ; No, LOOP back to CONVERSION
brk

OUR current code, as shown occupies memory area C000 to C02D. [46 bytes]
Executing it in virtual 6502 shows it required 319 Cycles to complete


ea a9 55 8d 3c 03 a0 80 8c 45 03 a2 00 ad 3c 03 2d 45 03 c9 00 d0 05 a9 30 4c 1e c0 a9 31 9d 08 07 e8 ad 45 03 4a 8d 45 03 e0 08 d0 e0 00


We put our code in and change the start address.  Than click load memory.






Then we click show memory and click the green PC and change it to C000



Clicking continuous run we see how many cycles are required before it breaks




So completing the same function we went from

[54 bytes] 349 Cycles to complete
to
[46 bytes] 319 Cycles to complete

Reducing the code size by 8 bytes and reducing the processor cycles by 30. (Almost 15%)




NEXT----->
Understanding 6502 assembly on the Commodore 64 - (15) The Zero Page

Table of contents


No comments:

Post a Comment