Summary: | Radeon: evergreen Atombios in loop during initialization on ppc64 | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Lucas Kannebley Tavares <lucaskt> | ||||||||||||||||||
Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> | ||||||||||||||||||
Status: | RESOLVED WONTFIX | QA Contact: | |||||||||||||||||||
Severity: | normal | ||||||||||||||||||||
Priority: | medium | CC: | brking, lucaskt | ||||||||||||||||||
Version: | XOrg git | ||||||||||||||||||||
Hardware: | Other | ||||||||||||||||||||
OS: | All | ||||||||||||||||||||
See Also: | https://bugs.freedesktop.org/show_bug.cgi?id=59672 | ||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||||
Attachments: |
|
Description
Lucas Kannebley Tavares
2013-01-28 18:37:52 UTC
Could you please attach your video bios to the bug cd /sys/bus/pci/devices/0000:01:05.0 sudo sh -c "echo 1 > rom" sudo sh -c "cat rom > ~/bios.rom" Something like that should do the trick, just change the pciid Hi Jerome, I attempted the dump without success [root@localhost ~]# lspci ... 0001:01:00.0 VGA compatible controller: ATI Technologies Inc ... 0001:01:00.1 Audio device: ATI Technologies Inc ... [root@localhost ~]# cd /sys/bus/pci/devices/0001:01:00.0 [root@localhost 0001:01:00.0]# echo 1 > rom [root@localhost 0001:01:00.0]# cat rom > ~/bios.rom [ 588.381813] pci 0001:01:00.0: Invalid ROM contents cat: rom: Input/output error [root@localhost 0001:01:00.0]# lspci [ 637.187942] Kernel panic - not syncing: FAIL ... [ 637.190672] =============================== [ 637.190677] [ INFO: suspicious RCU usage. ] [ 637.190683] 3.8.0-rc5-kotd+ #6 Not tainted [ 637.190687] ------------------------------- [ 637.190692] include/linux/rcupdate.h:468 Illegal context switch in RCU read-side critical section! ... I'm investigating what was wrong with the rom dump, but if you have any ideas what it could be, any help would be appreciated :) As a side note, the kernel panic is actually induced by me. It actually means there was an access to an invalid address. Can you try dumping the bios when booting with kms disable and nothing bind to the gpu Created attachment 73987 [details]
BIOS for the adapter
Here's the dump you requested, thanks for the reminder on modeset, was trying to achieve it via a pci_enable/io_remap on a quirk.
Created attachment 73997 [details] [review] possible fix For some reason the current crtc enabled bit isn't going low. It should be low already if the crtc is off, but perhaps it has to have been on previously for it to go low. I guessing that since this is a ppc system, the card was never previously posted so the display was off before the driver loaded. This patch checks to see what the current state of the crtcs are at driver load time. That way we can skip disabling the display if it's already off. That said, the messages are harmless in this case. Here is how we try to figure out atombios stuck. We use the atombios disasm : git://people.freedesktop.org/~mhopf/AtomDis To produce a readable file ./atomdis bios.rom > bios.txt Then when you get a message such as : *ERROR* atombios stuck executing C898 (len 62, WS 0, PS 0) @ 0xC8B4 It means it's stuck executing function that is at offset 0xc898 (look for c898 in your disasm output it's EnableCRTC. Inside that atombios function it's stuck in a loop. 0xC8B4 is the offset of the instruction at which the loop was interrupted (from one run to the other this offset might point to a different instruction in the same loop). So when you look at EnableCRTC it's stuck executing 0xC8B4 - 0xC898 = 0x1c which is : 001c: 4aa59c1b01 TEST reg[1b9c] [.X..] <- 01 0021: 491c00 JUMP_NotEqual 001c So test here test that register (0x1b9c << 2) ie register 0x6e70 as value of : 0x..01.... or if you prefer : (READREG(0x6e70) & 0x00ff0000) == 0x00010000 Lucas if you have any more atombios stuck don't hesitate to add them here. To find the register meaning you can grep the various header files of drivers/gpu/drm/radeon/ mostly evergreen one and modesetting one. Sorry ./atomdis bios.rom F > bios.txt Hi Jerome, thanks for the tips. Well, I followed the next error [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting [drm:atom_execute_table_locked] *ERROR* atombios stuck executing CC68 (len 72, WS 0, PS 0) @ 0xCC97 Down to the test in 0x2f on 0xcc68. command_table 0000cc68 #2c (UpdateCRTC_DoubleBufferRegisters): ... 0027: 5420b51b CLEAR reg[1bb5] [...X] 002b: 5420bd1b CLEAR reg[1bbd] [...X] 002f: 4a25b61b01 TEST reg[1bb6] [...X] <- 01 I have a question here: how do I determine what are these registers? I couldn't match 1bb6 to anything on the radeon driver code, so I suppose that's somewhere else... or is there some other way to read that? Anyway, I backtracked that code back to this call on atombios_crtc.c: static void atombios_lock_crtc(struct drm_crtc *crtc, int lock) { ... int index = GetIndexIntoMasterTable(COMMAND, UpdateCRTC_DoubleBufferRegisters); ... atom_execute_table(rdev->mode_info.atom_context, index, (uint32_t *)&args); } which could've come from either of these: static void atombios_crtc_prepare(struct drm_crtc *crtc) static void atombios_crtc_commit(struct drm_crtc *crtc) Since those are callbacks registered as helper funcs, and I'm not sure of their semantics, I ended up getting stuck :) static const struct drm_crtc_helper_funcs atombios_helper_funcs = { .prepare = atombios_crtc_prepare, .commit = atombios_crtc_commit, Any ideas here? Thanks! :) Nevermind the question about the registers, just re-read your post, which I should've done in the first place :) Thanks UpdateCRTC_DoubleBufferRegisters takes the crtc hardware lock so that updates happen atomically rather than double buffered updates during the vupdate period. You pass parameters to the atom table via a struct, in this case, ENABLE_CRTC_PS_ALLOCATION. args.ucCRTC = radeon_crtc->crtc_id; args.ucEnable = lock; 0006: 370000 SET_ATI_PORT 0000 (INDIRECT_IO_MM) Select the mmio register aperture. 0009: 5214 CALL_TABLE 14 (ASIC_StaticPwrMgtStatusChange/SetUniphyInstance) SetUniphyInstance updates the offset for the selected crtc based on args.ucCRTC parameter. 000b: 0765b61bfe AND reg[1bb6] [..X.] <- fe This enables enables double buffering. 0010: 3d650001 COMP param[00] [..X.] <- 01 This checks the params to see is we are enabling the lock (args.ucEnable = ATOM_ENABLE) or disabling the lock (args.ucEnable = ATOM_DISABLE). 0014: 443b00 JUMP_Equal 003b If args.ucEnable == ATOM_ENABLE, jump to table offset 0x003b. Drop the lock (args.ucEnable = ATOM_DISABLE). 0017: 5430761a CLEAR reg[1a76] [.X..] 001b: 54306e1a CLEAR reg[1a6e] [.X..] 001f: 5430271a CLEAR reg[1a27] [.X..] 0023: 5430111a CLEAR reg[1a11] [.X..] 0027: 5420b51b CLEAR reg[1bb5] [...X] 002b: 5420bd1b CLEAR reg[1bbd] [...X] 002f: 4a25b61b01 TEST reg[1bb6] [...X] <- 01 This tests the CRTC_DOUBLE_BUFFER_CONTROL.CRTC_UPDATE_PENDING bit. 0034: 492f00 JUMP_NotEqual 002f If the bit is high, we jump back to 0x002f. If the bit is low, we're done. 0037: 3a0000 SET_REG_BLOCK 0000 003a: 5b EOT Take the lock (args.ucEnable = ATOM_ENABLE). 003b: 0d25bd1b01 OR reg[1bbd] [...X] <- 01 0040: 54009e1b CLEAR reg[1b9e] [XXXX] 0044: 3a0000 SET_REG_BLOCK 0000 0047: 5b EOT Just like in the other table, for some reason, the bit never goes low. Thanks for clarifying those things! Well, I ran into a brand new set of questions while pursuing this. > 0006: 370000 SET_ATI_PORT 0000 (INDIRECT_IO_MM) > Select the mmio register aperture. This sounds like selecting BARs, but from what I see, Region 0 would be the framebuffer (256M) and Region 2 would be the MMIO registers. Or how are those addresses mapped from within the adapter? Or does that mean that there are multiple register banks and you're picking one? > 0009: 5214 CALL_TABLE 14 >(ASIC_StaticPwrMgtStatusChange/SetUniphyInstance) >SetUniphyInstance updates the offset for the selected crtc based on args.ucCRTC >parameter. How are parameters passed here? Does it get the same parameters that the first call received? I take it, the reference for param[00] there means ucCRTC, then. Is that it? > 0010: 3d650001 COMP param[00] [..X.] <- 01 > This checks the params to see is we are enabling the lock (args.ucEnable = > ATOM_ENABLE) or disabling the lock (args.ucEnable = ATOM_DISABLE) Ok, so, why is now param[00] referencing ucEnable? What is the reference to ucCRTC here? > 0034: 492f00 JUMP_NotEqual 002f > If the bit is high, we jump back to 0x002f. If the bit is low, we're done. So, the bit being low here means we don't have an update pending. Does it being high mean that the lock is still in effect (i.e. the CLEAR commands didn't take the disables down?)? > 0044: 3a0000 SET_REG_BLOCK 0000 > 0047: 5b EOT This seems to me like stack cleanup and return (I'm guessing EOT is End Of Table). Is that correct? On the kernel driver side, I couldn't find who is calling, or what's the purpose of the crtc_prepare and crtc_commit functions, which are the only ones apparently using this call (atombios_lock_crtc). What are they meant to do? Thanks (In reply to comment #13) > Thanks for clarifying those things! > > Well, I ran into a brand new set of questions while pursuing this. > > > 0006: 370000 SET_ATI_PORT 0000 (INDIRECT_IO_MM) > > Select the mmio register aperture. > > This sounds like selecting BARs, but from what I see, Region 0 would be the > framebuffer (256M) and Region 2 would be the MMIO registers. Or how are > those addresses mapped from within the adapter? Or does that mean that there > are multiple register banks and you're picking one? No. There's only one register BAR. It's for selecting between the register BAR and pci config registers. See atom_op_setport(). I've never seen a table actually use anything other than the register BAR however. > > > 0009: 5214 CALL_TABLE 14 > >(ASIC_StaticPwrMgtStatusChange/SetUniphyInstance) > >SetUniphyInstance updates the offset for the selected crtc based on args.ucCRTC >parameter. > > How are parameters passed here? Does it get the same parameters that the > first call received? I take it, the reference for param[00] there means > ucCRTC, then. Is that it? They are passed to the table for execution. See atom_execute_table(). That function takes an atom context, an index (which table to execute), and pointer to the parameter struct. > > > 0010: 3d650001 COMP param[00] [..X.] <- 01 > > This checks the params to see is we are enabling the lock (args.ucEnable = > > ATOM_ENABLE) or disabling the lock (args.ucEnable = ATOM_DISABLE) > > Ok, so, why is now param[00] referencing ucEnable? What is the reference to > ucCRTC here? See atombios_lock_crtc(). Use this parameter struct with the UpdateCRTC_DoubleBufferRegisters table: typedef struct _ENABLE_CRTC_PARAMETERS { UCHAR ucCRTC; UCHAR ucEnable; UCHAR ucPadding[2]; }ENABLE_CRTC_PARAMETERS; See atombios.h. The parameter struct is 1 dword. The first byte is ucCRTC and the second byte is ucEnable. > > > 0034: 492f00 JUMP_NotEqual 002f > > If the bit is high, we jump back to 0x002f. If the bit is low, we're done. > > So, the bit being low here means we don't have an update pending. Does it > being high mean that the lock is still in effect (i.e. the CLEAR commands > didn't take the disables down?)? If the bit is high it means there is an update pending. E.g., some change in the crtc state hasn't gone through yet. I'm not sure why you are seeing it stuck high. > > > 0044: 3a0000 SET_REG_BLOCK 0000 > > 0047: 5b EOT > > This seems to me like stack cleanup and return (I'm guessing EOT is End Of > Table). Is that correct? Yes. correct. > > On the kernel driver side, I couldn't find who is calling, or what's the > purpose of the crtc_prepare and crtc_commit functions, which are the only > ones apparently using this call (atombios_lock_crtc). What are they meant to > do? crtc_prepare() and crtc_commit() are called before and after a modeset on the crtc object. See drm_crtc_helper_set_mode() in drm_crtc_helper.c. In atombios_crtc_prepare() we take the crtc hardware lock so that all updates will happen atomically, then we disable the crtc. Then in atombios_crtc_mode_set() we set up the pll, set the crtc timing, graphics plane base address, and scaler. Finally in atombios_crtc_commit() we enable the crtc and drop the crtc hardware lock. Can someone clarify something here? Is the bit that we are waiting to go low a bit in the adapter's memory? If so, is it the adapter hardware that we are waiting to set this bit? Is there anyway to dump the adapter to determine its state when we hit the timeout? (In reply to comment #15) > Can someone clarify something here? Is the bit that we are waiting to go low > a bit in the adapter's memory? If so, is it the adapter hardware that we are > waiting to set this bit? It's a memory mapped register. We are waiting for one of the display related bits to go low, so it's the GPU that would be setting that bit. The timeout is in the driver. We eventually drop out of the loop in the atom interpretor if we get stuck after a certain number of seconds. > > Is there anyway to dump the adapter to determine its state when we hit the > timeout? You can dump the mmio registers. Created attachment 75176 [details] [review] Dumping registers to investigate values change Ok, so now I've tried dumping the register we're waiting for using this patch, and the output looks like this: OR_REG @ 0xD8EA EVERGREEN_CRTC_BLANK_CONTROL: 0001 0x6ED8: 10000 dst: REG[0x19A4].[7:0] -> 0x04 src: PS[0x00,0x0000].[7:0] -> 0x00 dst: REG[0x19A4].[7:0] <- 0x04 EOT @ 0xD8EF EVERGREEN_CRTC_BLANK_CONTROL: 0001 0x6ED8: 10000 << >> execute E82E (len 91, WS 0, PS 0) MOVE_PS @ 0xE834 EVERGREEN_CRTC_BLANK_CONTROL: ffffffff 0x6ED8: ffffffff I'm dumping 0x6ED8 as it is the register whose bit never goes down. Following this, all references to either register are All F's. I'm wondering if this could be my testing interfering with the adapter operation, or if this is really what's going on, as it could indicate other problems. Can I be dumping these registers there? Does that interfere with tests? Should I dump another register for testing? Which one would be best? From the 0xD8EA address, I can conclude it was executing the DAC1OutputControl function from the atombios that exited sucessfully. I'm investigating what happens afterwards that trigger this. Is it interrupt activation? Right now we're having to use LSIs, so it might be a problem there. Thanks So when all register return 0xffffffff it's because something went horribly wrong. Either the GPU memory controller is lockup or in bad or the IOMMU is blocking things. My guess is that enabling the crtc to start scanning trigger request to the GPU memory controller and those request points to bad address. I would tripple check the memory controller setup and that the crtc base register points to valid vram inside the GPU memory address space. Also does the ring/ib test that happen prior to any modesetting report success ? Created attachment 75183 [details] [review] dumping Ok so dump reg that might trigger the GPU memory controller to start faulting. Created attachment 75196 [details] [review] Fixes on the Workaround Ok, there were some minor issues with the workaround which are fixed here The output is: [drm] ring test on 3 succeeded in 1 usecs [drm] GRPH_PRIMARY_SURFACE[ 0] 0x0000000000000000 [drm] GRPH_SECONDARY_SURFACE[ 0] 0x0000000000000000 [drm] GRPH_PRIMARY_SURFACE[ 1] 0x0000000000000000 [drm] GRPH_SECONDARY_SURFACE[ 1] 0x0000000000000000 [drm] GRPH_PRIMARY_SURFACE[ 2] 0x0000000000000000 [drm] GRPH_SECONDARY_SURFACE[ 2] 0x0000000000000000 [drm] GRPH_PRIMARY_SURFACE[ 3] 0x0000000000000000 [drm] GRPH_SECONDARY_SURFACE[ 3] 0x0000000000000000 [drm] ib test on ring 0 succeeded in 0 usecs And there's no longer a "stuck in loop" message, but the registers do become all f's. I'm investigating exactly where, to see if it's still the same issue. What is weird is that it's showing reg all with 0 value which would mean that my patch does nothing but still you seem to go further along. Probably me doing bad casting can you add (uint64_t) in front of each RREG32 in my patch and see if it still print 0000000000000 Created attachment 75313 [details] [review] Fixes on the Workaround Hi Jerome, this is the patch I actually used. I had already done what you said and also removed a couple of left shifts you had added upon reading the low words. The results are still the same, though. Right now I'm tracing it by going through the code between the call to DAC1OutputControl on radeon_atom_encoder_dpms_avivo and the call to DPEncoderService on either radeon_dp_encoder_service or radeon_dp_link_train and instrumenting it, not sure which one yet, but I'm guessing the first. This is to make sure the driver is not doing anything else that could be going wrong in between calls, because after that DAC1OutputControl call, everything is still fine, it's somewhere in between those calls that the adapter goes to hell. Another thing that I'd like to ask is, you suggested me to "make sure the pipes are off". I've looked through the registers looking for something to get DAC state or disable them, and have not found it. As far as I can tell, your patch would already make sure we're not doing improper access, but are there any more interesting registers I should be looking into as well? Thanks If the card is not posted by the sbios, the display hardware is disabled until the driver attempts to initialize it. The display controller enable bit is bit 0 of CRTC_CONTROL (0x6e70 + crtc_offset). The DAC enable bit is bit 0 of DAC_ENABLE (0x6790). Created attachment 75640 [details] [review] Adding tests for all-1s after every read or write Ok, so after applying the refered to patch, I got several false WARN_ONs (where the adapter keeps working, so it's just a regular 0xFF), and at one point, I start getting real all-1s. That place is this: WS[0x41].[31:24] <- 0x23 MOVE_REG @ 0xD99C EVERGREEN_CRTC_BLANK_CONTROL: 0001 0x6ED8: 10000 src: WS[0x41].[31:0] -> 0x2304FFFF dst: REG[0x018A]------------[ cut here ]------------ WARNING: at drivers/gpu/drm/radeon/radeon_device.c:111 Modules linked in: radeon(+) drm_kms_helper ttm drm i2c_algo_bit i2c_core autofs4 sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 sg ibmveth shpchp ext4(F) jbd2(F) mbcache(F) sr_mod(F) cdrom(F) sd_mod(F) crc_t10dif(F) dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F) NIP: d0000000069c2110 LR: d0000000069c2104 CTR: c000000000677f00 REGS: c00000000590a540 TRAP: 0700 Tainted: GF W (3.8.0+) MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 28222482 XER: 0000000b SOFTE: 1 CFAR: d000000006a029d0 TASK = c0000001ecd0b680[2589] 'modprobe' THREAD: c000000005908000 CPU: 4 GPR00: d0000000069c2104 c00000000590a7c0 d000000006abfd00 00000000ffffffff GPR04: 0000000000000001 0000000000000000 0000000000000000 000000002304ffff GPR08: 0000000030783031 c000000001067c50 000000000b3193b0 c000000000677f00 GPR12: d000000006a76f30 c00000000edd0c00 00000080646700a0 0000000000000000 GPR16: 0000010003bd0100 0000000000000000 c00000000590bc78 0000000000000030 GPR20: c00000000590aa58 c00000000590aa50 0000000000000001 c0000001e5082000 GPR24: c000000006fd8c80 000000002304ffff 000000000000018a 00000000ffffffff GPR28: 0000000000000000 000000002304ffff d000000006ab66d8 c0000001e5082000 NIP [d0000000069c2110] .cail_reg_write+0x50/0x70 [radeon] LR [d0000000069c2104] .cail_reg_write+0x44/0x70 [radeon] Call Trace: [c00000000590a7c0] [d0000000069c2104] .cail_reg_write+0x44/0x70 [radeon] (unreliable) [c00000000590a850] [d0000000069d9530] .atom_put_dst+0x110/0x710 [radeon] [c00000000590a920] [d0000000069dadd0] .atom_op_move+0xf0/0x1d0 [radeon] [c00000000590a9e0] [d0000000069db1c4] .atom_execute_table_locked+0x314/0x3a0 [radeon] [c00000000590aaf0] [d0000000069db5f8] .atom_op_calltable+0x108/0x170 [radeon] [c00000000590ab80] [d0000000069db1c4] .atom_execute_table_locked+0x314/0x3a0 [radeon] [c00000000590ac90] [d0000000069db5f8] .atom_op_calltable+0x108/0x170 [radeon] [c00000000590ad20] [d0000000069db1c4] .atom_execute_table_locked+0x314/0x3a0 [radeon] [c00000000590ae30] [d0000000069db2a4] .atom_execute_table+0x54/0x80 [radeon] [c00000000590aed0] [d0000000069db474] .atom_asic_init+0x1a4/0x220 [radeon] [c00000000590afb0] [d000000006a520e8] .evergreen_init+0x108/0x330 [radeon] [c00000000590b040] [d0000000069c1d28] .radeon_device_init+0x578/0x6f0 [radeon] [c00000000590b0e0] [d0000000069c48c0] .radeon_driver_load_kms+0xc0/0x180 [radeon] [c00000000590b180] [d000000004eef200] .drm_get_pci_dev+0x1e0/0x2d0 [drm] [c00000000590b240] [d0000000069a023c] .radeon_pci_probe+0xbc/0x100 [radeon] [c00000000590b2d0] [c000000000359374] .local_pci_probe+0x64/0xb0 [c00000000590b370] [c000000000359488] .pci_call_probe+0xc8/0xf0 [c00000000590b410] [c00000000035a570] .pci_device_probe+0x90/0xb0 [c00000000590b4a0] [c000000000412004] .really_probe+0xb4/0x370 [c00000000590b550] [c000000000412320] .driver_probe_device+0x60/0xe0 [c00000000590b5e0] [c0000000004124ac] .__driver_attach+0x10c/0x110 [c00000000590b670] [c00000000040f7a8] .bus_for_each_dev+0x98/0xf0 [c00000000590b720] [c000000000411b28] .driver_attach+0x28/0x40 [c00000000590b7a0] [c0000000004106a8] .bus_add_driver+0x188/0x320 [c00000000590b840] [c000000000412c7c] .driver_register+0x9c/0x1c0 [c00000000590b8e0] [c00000000035a6b8] .__pci_register_driver+0x48/0x60 [c00000000590b960] [d000000004eef45c] .drm_pci_init+0x16c/0x1a0 [drm] [c00000000590ba10] [d000000006a76c14] .radeon_init+0x108/0xa414 [radeon] [c00000000590baa0] [c00000000000acc4] .do_one_initcall+0x64/0x1e0 [c00000000590bb60] [c0000000000fb0c8] .do_init_module+0x68/0x1e0 [c00000000590bc00] [c0000000000fc634] .load_module+0x8b4/0x9c0 [c00000000590bd30] [c0000000000fca18] .SyS_init_module+0x118/0x160 [c00000000590be30] [c000000000009954] syscall_exit+0x0/0x94 Instruction dump: e9230000 ebe90330 7fe3fb78 480406e5 60000000 7fe3fb78 7fa4eb78 38a00000 480407c1 60000000 2f83ffff 409e0008 <0fe00000> 38210090 e8010010 eba1ffe8 ---[ end trace 7065b906d56b6c01 ]--- .[31:0] <- 0x2304FFFF AND_REG @ 0xD9A1 Which seems to imply that at AtomBIOS function #10 - MemoryPLLInit things go bad, when it executes this instruction @94 0079: 0300418a01 MOVE WS_REMIND/HI32 [XXXX] <- reg[018a] [XXXX] 007e: 5e05410000f7dfffff0001 MASK WS_REMIND/HI32 [XXXX] & dff70000 | 0100ffff 0089: 4ba50102 TEST param[01] [.X..] <- 02 008d: 449400 JUMP_Equal 0094 0090: 0fe54120 OR WS_REMIND/HI32 [X...] <- 20 0094: 01028a0141 MOVE reg[018a] [XXXX] <- WS_REMIND/HI32 [XXXX] Any thoughts in this? I've been trying to makes heads or tails of what exactly this means for a few hours now. I know it's initializing the PLL, what I don't get is why zeroing out bits 30 and 19, and then setting bit 24 would cause invalid memory accesses. Or, did my test influence the flow of the program, and I shouldn't be reading this register shortly after writing to it? Btw, the calling path here seems to be evergreen_init -> atom_asic_init -> (ASIC_Init) -> (SetMemoryClock) -> (MemoryPLLInit) I don't know what this register does maybe Alex can shed some light I'm trying to find out more internally. Does the card work on an x86 system (even just checking to see if the bios post screen is fine)? I just want to confirm that it's not an issue with the card itself. Well, I don't have an x86 system to test that on. I could get one, in time. What I do have are two different adapters, bought separately, on two different ppc64 systems with the exact same error. This makes me think the adapters are fine :) I just altered the patch, removing the reads that were forced after writes to make it less intrusive and the results are the same. After some further investigation, we found that despite the fact that we were disabling MSIs, the adapter was still using it. After we provided a 32-bit address to it, we got it to work properly. The solution to this will have to be done not in software, so I'm closing the bug. Thanks for all the help, guys |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.