MagicEngine
Forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 Japanese English 

operating under the T flag
Goto page 1, 2  Next
 
Post new topic   Reply to topic    MagicEngine Forum Index -> General
View previous topic :: View next topic  
Author Message
Tomaitheous
Elder
Elder


Joined: 27 Sep 2005
Posts: 306
Location: Tucson

PostPosted: Sat May 20, 2006 5:40 am    Post subject: operating under the T flag Reply with quote

A few questions about the T flag.

Are there additional cycles attached to AND, OR, ADC, etc instructions when the T flag is set?

Does the value already existing in the A register automatically transfer into the ZP address on setting the T flag?

How do the non logic instructions react under the T flag (i.e tay, tam, etc) ?



-Rich
_________________
www.pcedev.net
Back to top
View user's profile Send private message
Charles MacDonald
Member
Member


Joined: 07 Dec 2005
Posts: 35

PostPosted: Sun May 21, 2006 4:24 am    Post subject: Re: operating under the T flag Reply with quote

Quote:
Are there additional cycles attached to AND, OR, ADC, etc instructions when the T flag is set?


I would imagine at the very least 2 cycles are added to read the zero page byte before the operation and write it back afterwards.

Quote:
Does the value already existing in the A register automatically transfer into the ZP address on setting the T flag?


Not quite. When a logic instruction is prefixed by SET, the operation that would take place on the accumulator is instead done to a zero page memory location indexed by the X register. The accumulator is not affected in any way.

Quote:

How do the non logic instructions react under the T flag (i.e tay, tam, etc) ?


All other instructions operate normally and are not affected by the T flag being set. Note that because of this you have to be careful of the order in which the T flag is set:

ldx #$80
clc
set
adc #$01 ; OK, adds 1 to memory address $2080

as opposed to

ldx #$80
set
clc ; T flag cleared here
adc #$01 ; adds 1 to accumulator, oops
Back to top
View user's profile Send private message Visit poster's website
Tomaitheous
Elder
Elder


Joined: 27 Sep 2005
Posts: 306
Location: Tucson

PostPosted: Sun May 21, 2006 7:14 pm    Post subject: Reply with quote

Looking throught the instruction set I see that every instruction clears the T flag. So I guess that doesn't make as useful as I thought. I should have just looked at your doc as it has all the info I was looking for Embarassed

Is the use of the T flag and the behavier of the corrosponding instruction officially documented? Or is this similar in case of the hitachi 6309 native mode not being documented until years later.

I almost forgot. Charles, in your 'pcetech' you mentioned that there is one additional clock cycle/wait state added to any instruction that accesses the VDC and VCE. Is this factored into the block transfer instructions rated at 6 clock cycles per byte, or is it actually taking 7 clock cycles per byte when pointed to the VDC/VCE?


-Rich
_________________
www.pcedev.net
Back to top
View user's profile Send private message
Charles MacDonald
Member
Member


Joined: 07 Dec 2005
Posts: 35

PostPosted: Sun May 21, 2006 10:28 pm    Post subject: Reply with quote

Quote:
Is the use of the T flag and the behavier of the corrosponding instruction officially documented? Or is this similar in case of the hitachi 6309 native mode not being documented until years later.


Yes, the function of the T flag (and all the non-standard features added to the HuC6280 from the base 65C02 feature set) have been documented in the Develo Book and in the PC-Engine developer manuals. Actually getting ahold of either one is another story, so for regular people like myself there is a bit of mystery surrounding such things. Smile

Quote:
I almost forgot. Charles, in your 'pcetech' you mentioned that there is one additional clock cycle/wait state added to any instruction that accesses the VDC and VCE. Is this factored into the block transfer instructions rated at 6 clock cycles per byte, or is it actually taking 7 clock cycles per byte when pointed to the VDC/VCE?


I can't quite figure this one out. Way back when in a timing test I found that any access to the entire $0000-$03FF range (VDC) or $0400-$07FF range (VCE) took one extra cycle, whether it be a read or write, for program execution or just regular memory access.

Recently I've been doing some work on the hardware side and found that the VDC can control the HuC6280's WAIT signal to delay processing. However it seems to only do that during VRAM reads and writes. That explains where the extra cycle comes from, but then you'd think addresses like $0000/$0001 wouldn't be affected - they too have a 1 cycle delay. Plus the VCE can't even control WAIT so the VCE delay seems impossible.

Basically I need to do more testing before I can give a definitive answer. Wink However I think it would be highly likely there is one extra cycle for every transfer so it would be 7 clock cycles per byte. I'll check all this stuff Real Soon Now.
Back to top
View user's profile Send private message Visit poster's website
Tomaitheous
Elder
Elder


Joined: 27 Sep 2005
Posts: 306
Location: Tucson

PostPosted: Tue May 23, 2006 5:13 am    Post subject: Reply with quote

Quote:
I'll check all this stuff Real Soon Now.


Cool! I guess I'll try some clock tests too since I'm curious if there's any delay difference between the SGX and PCE when accessing the VDC/VCE.

Quote:
Plus the VCE can't even control WAIT so the VCE delay seems impossible.


Is this tested on a PCE and SGX?
_________________
www.pcedev.net
Back to top
View user's profile Send private message
Charles MacDonald
Member
Member


Joined: 07 Dec 2005
Posts: 35

PostPosted: Mon May 29, 2006 6:42 pm    Post subject: Reply with quote

Tomaitheous wrote:

Cool! I guess I'll try some clock tests too since I'm curious if there's any delay difference between the SGX and PCE when accessing the VDC/VCE.


I can confirm as a matter of fact that any access to $0000-$03FF (VDC) or $0400-$07FF (VCE) takes exactly 1 extra cycle each. For example LDA $0000 is 6 cycles instead of 5 on a regular CoreGrafx II (standard PCE chipset). I'll dig up the SuperGrafx and check on that later.

Note that this extra cycle penalty is for each access to a VDC/VCE address; so a RMW instruction would be +2 cycles (+1 for read and +1 for write), a block transfer instruction would be +1 for each source read and/or +1 for each destination write that access those areas.

I've got this great system for automatically timing code sequences working at the moment. Are there any other timings you wanted checked? I was going to look at CSL/CSH and the overhead from having the T flag set.
Back to top
View user's profile Send private message Visit poster's website
Tomaitheous
Elder
Elder


Joined: 27 Sep 2005
Posts: 306
Location: Tucson

PostPosted: Tue May 30, 2006 12:29 am    Post subject: Reply with quote

Quote:
Note that this extra cycle penalty is for each access to a VDC/VCE address; so a RMW instruction would be +2 cycles (+1 for read and +1 for write), a block transfer instruction would be +1 for each source read and/or +1 for each destination write that access those areas.


Hmm, that has me thinking - is the +1 penalty for writing to the VDC per byte or per word? I guess that would depend on which side the latch was on. If it's on the CPU side then it should be +1 for each word? Same with the VCE?

I'll have to check again, but I remember having a test demo that could write about 5-6 colors to the VCE before the VDC goes active on a scanline - by starting the block transfer at the beginning of the h-sync interrupt(@ 5mhz with centered 256 active pixel setup). Even without the each +1 delay, it should be only able to update 2-3 colors at the most. Unless the hsync interrupt was/is being generated before the start of the next line - maybe at the end of the active display of the VDC of the previous line. Any ideas?

Also, is the CSH/CSL really changing the clock speed or just inserting/enabling wait states?

-Rich
_________________
www.pcedev.net
Back to top
View user's profile Send private message
Tomaitheous
Elder
Elder


Joined: 27 Sep 2005
Posts: 306
Location: Tucson

PostPosted: Sun Jun 04, 2006 10:12 pm    Post subject: Reply with quote

Maybe there is a good but limited use for the T flag after all Smile


Code:

8/16bit add to 16bit var$ in zeropage

lda ZZ        ; 4
clc           ; 2
adc #$xx      ; 2
sta ZZ        ; 4
lda ZZ+1      ; 4
adc #$xx      ; 2
sta ZZ+1      ; 4   total 22 cycles

And now with the T flag.

ldx #LOW(ZZ label)   ; 2
clc                  ; 2
set                  ; 2
adc #$xx             ; 2
inx                  ; 2
set                  ; 2
adc #$xx             ; 2   total 14 cycles


incrementing an 8bit pointer for a 16bit wide array

inc ZZ               ; 6
inc ZZ               ; 6   total 12 cycles

ldx #LOW(ZZ label)   ; 2
clc                  ; 2
set                  ; 2
adc #$02             ; 2   total 8 cycles


-Rich
_________________
www.pcedev.net
Back to top
View user's profile Send private message
dmichel
Admin
Admin


Joined: 04 Apr 2002
Posts: 1166
Location: France

PostPosted: Mon Jun 05, 2006 7:19 pm    Post subject: Reply with quote

You forgot to add the extra cycles to 'adc' when the T flag is set. Razz
(the T flag adds 3 cycles)

I think I never used 'set' but it's true that it could be useful at time.
_________________
David Michel
Back to top
View user's profile Send private message
Tomaitheous
Elder
Elder


Joined: 27 Sep 2005
Posts: 306
Location: Tucson

PostPosted: Tue Jun 06, 2006 1:27 am    Post subject: Reply with quote

Quote:
You forgot to add the extra cycles to 'adc' when the T flag is set.
(the T flag adds 3 cycles)


So ADC #$xx under T flag is 5 cycles instead of 2?

From the looks of it, you save 1 cycle and 1 byte for every use of the T flag version.


Btw-
Interesting peice of info(for me atleast Laughing) - according to WDC, for W65c02s addressing mode 'READ-MODIFY_WRITE', add 2 cycles, but later on mentions 3 cycles - probably 1 extra clock cycles for crossing a page boundry(PC fetch data), like with R-M-W absolute addressing. The W65C02S looks to be closer to the Hu6280 than the 65C02. This would make sense for _Bnu's cycle reference discrepancy in Warren Wilkinson's doc.

Hmm, it would be worth trying to copy code into ram($2000) and execute a test loop for LDA ZZ w/ a 16bit incrementer and compare it to the TIMER difference VS executing code not in the same bank as zeropage.

*UPDATE*
I wrote the test program and tested in with a SGX and PCE, there is no speed difference you accessing ZP across a page boundry.

I set the timer loop to 16384 cycles($10 @ $C00) and incremented a 16 bit counter in zeropage until the timer interrupt occured.

Code:

    MagicEngine     SGX/PCE

*normal ZP code

ram     0x23D        0x244
rom     0x23F        0x244

*T flag code

ram     0x268        0x26E
rom     0x252        0x26E


Hmm, a quick look at the math showed the T flag on the real hardware is +2 cycles for ADC,AND, etc.
_________________
www.pcedev.net
Back to top
View user's profile Send private message
dmichel
Admin
Admin


Joined: 04 Apr 2002
Posts: 1166
Location: France

PostPosted: Tue Jun 06, 2006 11:58 am    Post subject: Reply with quote

Tomaitheous wrote:

*UPDATE*
I wrote the test program and tested in with a SGX and PCE, there is no speed difference you accessing ZP across a page boundry.


Yup, the PCE doesn't have this one-cycle penality.

Quote:

...
Hmm, a quick look at the math showed the T flag on the real hardware is +2 cycles for ADC,AND, etc.


Interesting... I got the 3-cycles info from the Develo Book, may be they made a mistake. It's even more useful then. Smile
_________________
David Michel
Back to top
View user's profile Send private message
Tomaitheous
Elder
Elder


Joined: 27 Sep 2005
Posts: 306
Location: Tucson

PostPosted: Wed Jun 07, 2006 5:32 am    Post subject: Reply with quote

Quote:
Interesting... I got the 3-cycles info from the Develo Book


Yeah, my quick math was inaccurate Embarassed It's +3 to ADC Razz
_________________
www.pcedev.net
Back to top
View user's profile Send private message
Charles MacDonald
Member
Member


Joined: 07 Dec 2005
Posts: 35

PostPosted: Sat Jun 10, 2006 6:30 am    Post subject: Reply with quote

Quote:
Hmm, that has me thinking - is the +1 penalty for writing to the VDC per byte or per word? I guess that would depend on which side the latch was on. If it's on the CPU side then it should be +1 for each word? Same with the VCE?


It's not really specific to a particular VDC/VCE address, like the data port (which as you said has a latch on the LSB). The penalty happens for any access at all; addresses like $0000, $0001, $0400, $0407, $7FF, etc.

Quote:
Unless the hsync interrupt was/is being generated before the start of the next line - maybe at the end of the active display of the VDC of the previous line. Any ideas?


The line interrupt trigger point is relative to /HSYNC, but in the PCE the VCE provides that signal rather than the VDC. I haven't checked the timing details for this yet. For a VDC-only configuration where it generates the timing signals itself, I think the line interrupt happened fairly early in a scanline to give the most amount of time possible for register changes before the next line.

Quote:
Also, is the CSH/CSL really changing the clock speed or just inserting/enabling wait states?


I guess you could say the internal clock speed changes. The HuC6280 is connected to a 21 MHz clock, and the CSL/CSH instructions select if the clock signal, divided by 12 or 3 respectively, is used for the 'CPU core' part of the chip. The PSG and timer have their own independant clocks too. If you are interested, US patent 5,483,659 discusses this in detail.
Back to top
View user's profile Send private message Visit poster's website
Tomaitheous
Elder
Elder


Joined: 27 Sep 2005
Posts: 306
Location: Tucson

PostPosted: Mon Jun 12, 2006 6:59 pm    Post subject: Reply with quote

Quote:
The line interrupt trigger point is relative to /HSYNC, but in the PCE the VCE provides that signal rather than the VDC.


Since the VDC flags the interrupt, it must somehow know/assume where the HSYNC is even though it's not generating it, based on REGs $0A-$0E as if it were generating the HSYNC itself.
_________________
www.pcedev.net
Back to top
View user's profile Send private message
Charles MacDonald
Member
Member


Joined: 07 Dec 2005
Posts: 35

PostPosted: Mon Jun 12, 2006 7:44 pm    Post subject: Reply with quote

Tomaitheous wrote:
Quote:
The line interrupt trigger point is relative to /HSYNC, but in the PCE the VCE provides that signal rather than the VDC.


Since the VDC flags the interrupt, it must somehow know/assume where the HSYNC is even though it's not generating it, based on REGs $0A-$0E as if it were generating the HSYNC itself.


Yeah, I should have been more clear about that. In the PCE, the VDC is set up so it's /HSYNC and /VSYNC pins are inputs, rather than outputs. It then synchronizes itself to whatever supplies those signals, in this case it's the VCE.

This comes up in odd places, such as if a VD interrupt isn't generated during the current frame, the VDC forces one when the VCE asserts /VSYNC. Raster interrupts *may* work the same way relating to /HSYNC, though for the sake of simplicity I really hope not.
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    MagicEngine Forum Index -> General All times are GMT + 1 Hour
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group