weixin_39595487 2020-11-30 03:02
浏览 0

[OGRE-335] "endless world" terrain sample crashes with shader compilation issues

[reporter="loath", created="Mon, 18 Nov 2013 07:14:36 +0100"]

built ogre and the samples from the 1.9 rc2 using vs2013, windows 8.1, boost 1.55, and dx9.

the endless world sample crashes on both my desktop and laptop. attached logs from both computers.

repro: fly in a direction for about 30-60 seconds. sometimes this happens right away, but it's taken up to 15 minutes to repro.

cause: this issue appears "randomly" might be a race condition or heap / process / compiler state corruption.

the sequence of events leading up to the crash are the following:

1) the ogre terrain material generator creates a string containing the cg syntax for the terrain shader.

  • this happens very often, for each new page being drawn.
  • not sure why this can't be cached but didn't look into that code.

2) this cg is compiled into the hlsl assembly version of the same program also as a string.

(see void CgProgram::compileMicrocode(void) in OgreCgProgram.cpp, 322.)

  • note the program has no parameters. (are there other mechanisms that control compilation results like global const defines?)
  • under normal and crashing scenarios this cg source string is 1082 bytes.
    i.e. it appears we're consistent in creating the cg source code – it's always exactly the same.
  • once compiled into the hlsl assembly the size of the assembly is different depending on the error condition.
  • if the error is about to occur, the size of the assembly string is always 0x7ff bytes. otherwise the size is slightly smaller.
  • the reason the assembly string is larger in the failure case is that a constant register definition that should look something like:
    def c16, 1.00000000, 0.00000000, 0, 0

looks like the following instead:
def c16, 1.00000000, -1.#IND0000, 0.00000000, 0

  • the failure case appears to include a NAN const value. perhaps from some error that occurs during cg compilation in cgCreateProgram()
    inside CgProgram::compileMicrocode()?
  • the actual "crash" occurs later inside D3D9GpuProgram::compileMicrocode().
  • the hlsl assembly from above is passed into D3DXAssembleShader()
  • IF the hlsl contains the NAN from above, the compilation fails.
  • ogre throws, and the process comes down.

next steps:

  • since the error appears rooted in the compilation to hlsl assembly, i thought perhaps there was a bug the cg compiler.
  • i tried replacing the .10 version of cg.dll that came with the ogre deps download with the latest from nvidia. (ver .13) was still able to repro the issue
  • i added some hackery to CgProgram::compileMicrocode() to RECOMPILE whenever a program of size 0x7ff bytes was returned.
    i.e. if i'm going to hit the error case anyway, try recompiling and see if we get a better result.
  • this appears to always succeed (i.e. i never see a double failure but maybe it's just an extremely remote possibility)
  • the program from this point on continues to run fine. i.e. no expected crash occurs.
  • obviously this is a horrible hack that we can't use but clearly demonstrates a "transient" error.
  • next i also modified the code to do the same thing, but loop infinitely to see if the race condition continues to manifest.
  • i modified the code at this point to loop forever, recompiling the code and watching for another 0x7ff byte result.
  • i left this running for 10-15 minutes and never saw another 0x7ff.
  • i will try this again running for longer.

ideas?

  • recompile boost, ogre 1.9rc using vs2010.
  • eliminates new compiler, new boost.

该提问来源于开源项目:OGRECave/ogre

  • 写回答

11条回答 默认 最新

  • weixin_39595487 2020-11-30 03:02
    关注

    [author="masterfalcon", created="Tue, 19 Nov 2013 07:40:08 +0100"]

    I can see the artifacts here but I don't get the crash. However I'm testing with GL on OS X. Have you tried running with GL to see if you get the crash with that rendersystem as well?

    评论

报告相同问题?