Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 14
Posts: 14   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3321 times and has 13 replies Next Thread
armstrdj
Former World Community Grid Tech
Joined: Oct 21, 2004
Post Count: 695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU units sticking

Coolstream, there is a file in the slot directory "stderr.txt". Can you paste the contents of one of those from a stuck workunit?

Thanks,
armstrdj
[Nov 14, 2012 2:58:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU units sticking

What driver version are you running?


nanoprobe, I'm using ATI 12.10 on BOINC 7.0.36

Try upgrading to driver 12.11beta4. I have seen the problem you're having before and the new 12.11beta4 driver seems to resolve it. There is also a 12.11beta6 out but I have not tried it so I would recommend you try the beta4 version first. The recommended installation procedure is to uninstall the 12.10 driver. Run a program called driver sweeper to remove any left over remnants and do a clean install of the 12.11beta4. Hopefully this will cure your problem.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


[Nov 14, 2012 4:16:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
coolstream
Senior Cruncher
SCOTLAND
Joined: Nov 8, 2005
Post Count: 475
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU units sticking

Coolstream, there is a file in the slot directory "stderr.txt". Can you paste the contents of one of those from a stuck workunit?

Thanks,
armstrdj

Hi, thanks for picking this up for me.

I am adding stderr.txt from a stuck unit. This is one which I paused and then resumed, and on the resume, the unit completed and has been verified as being valid.

I have coloured the text to differentiate between the stuck and good sessions. I hope this will be of use for you.

Commandline: projects/www.worldcommunitygrid.org/wcg_hcc1_img_7.05_windows_intelx86__ati_hcc1 --zipfile X0960075380493200610061842.zip --imagelist images.txt --device 1
<app_init_data>
<major_version>7</major_version>
<minor_version>0</minor_version>
<release>36</release>
<app_version>705</app_version>
<app_name>hcc1</app_name>
<project_preferences>


<color_scheme>Tahiti Sunset</color_scheme>
<max_frames_sec>7</max_frames_sec>
<max_gfx_cpu_pct>5.0</max_gfx_cpu_pct>
</project_preferences>

<project_dir>C:\ProgramData\BOINC/projects/www.worldcommunitygrid.org</project_dir>
<boinc_dir>C:\ProgramData\BOINC</boinc_dir>
<wu_name>X0960075380493200610061842</wu_name>
<result_name>X0960075380493200610061842_1</result_name>
<comm_obj_name>boinc_1</comm_obj_name>
<slot>1</slot>
<wu_cpu_time>0.000000</wu_cpu_time>
<starting_elapsed_time>0.000000</starting_elapsed_time>
<using_sandbox>0</using_sandbox>
<user_total_credit>24793864.372740</user_total_credit>
<user_expavg_credit>99431.719652</user_expavg_credit>
<host_total_credit>1359935.655251</host_total_credit>
<host_expavg_credit>39801.581456</host_expavg_credit>
<resource_share_fraction>1.000000</resource_share_fraction>
<checkpoint_period>60.000000</checkpoint_period>
<fraction_done_start>0.000000</fraction_done_start>
<fraction_done_end>1.000000</fraction_done_end>
<gpu_type>ATI</gpu_type>
<gpu_device_num>1</gpu_device_num>
<gpu_opencl_dev_index>1</gpu_opencl_dev_index>
<ncpus>1.000000</ncpus>
<rsc_fpops_est>26106628485363.000000</rsc_fpops_est>
<rsc_fpops_bound>522132569707260.000000</rsc_fpops_bound>
<rsc_memory_bound>78643200.000000</rsc_memory_bound>
<rsc_disk_bound>50000000.000000</rsc_disk_bound>
<computation_deadline>1353443811.000000</computation_deadline>
<vbox_window>0</vbox_window>
</app_init_data>
INFO: gpu_type set in init_data.xml to ATI
INFO: gpu_device_num set in init_data.xml to 1
Boinc requested ATI gpu device number1
Unzipping input images ../../projects/www.worldcommunitygrid.org/X0960075380493200610061842_X0960075380493200610061842.zip
Processing jobdescription
Number of Images defined in image list is 2
Found compute platform Advanced Micro Devices, Inc.
Selecting this platform
CL_DEVICE_NAME: Cypress
CL_DEVICE_VENDOR: Advanced Micro Devices, Inc.
CL_DEVICE_VERSION: 1016.4 (VM)
CL_DEVICE_MAX_COMPUTE_UNITS: 
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 256 / 256 / 256
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256
CL_DEVICE_MAX_CLOCK_FREQUENCY: 725 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 512 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 1024 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 32 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_EXTENSIONS:
cl_khr_fp64
cl_amd_fp64
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_3d_image_writes
cl_khr_byte_addressable_store
cl_khr_gl_sharing
cl_ext_atomic_counters_32
cl_amd_device_attribute_query
cl_amd_vec3
cl_amd_printf
cl_amd_media_ops
cl_amd_popcnt
cl_khr_d3d10_sharing
cl_khr_dx9_media_sharing
Estimated kernel execution time = 0.33791 [sec]
Starting analysis of X0960075380493200610061842.jp2...
Extracting GLCM features...

Commandline: projects/www.worldcommunitygrid.org/wcg_hcc1_img_7.05_windows_intelx86__ati_hcc1 --zipfile X0960075380493200610061842.zip --imagelist images.txt --device 1
<app_init_data>
<major_version>7</major_version>
<minor_version>0</minor_version>
<release>36</release>
<app_version>705</app_version>
<app_name>hcc1</app_name>
<project_preferences>


<color_scheme>Tahiti Sunset</color_scheme>
<max_frames_sec>7</max_frames_sec>
<max_gfx_cpu_pct>5.0</max_gfx_cpu_pct>
</project_preferences>

<project_dir>C:\ProgramData\BOINC/projects/www.worldcommunitygrid.org</project_dir>
<boinc_dir>C:\ProgramData\BOINC</boinc_dir>
<wu_name>X0960075380493200610061842</wu_name>
<result_name>X0960075380493200610061842_1</result_name>
<comm_obj_name>boinc_1</comm_obj_name>
<slot>1</slot>
<wu_cpu_time>0.000000</wu_cpu_time>
<starting_elapsed_time>0.000000</starting_elapsed_time>
<using_sandbox>0</using_sandbox>
<user_total_credit>24801004.954473</user_total_credit>
<user_expavg_credit>99189.922793</user_expavg_credit>
<host_total_credit>1362023.968953</host_total_credit>
<host_expavg_credit>39620.257820</host_expavg_credit>
<resource_share_fraction>1.000000</resource_share_fraction>
<checkpoint_period>60.000000</checkpoint_period>
<fraction_done_start>0.000000</fraction_done_start>
<fraction_done_end>1.000000</fraction_done_end>
<gpu_type>ATI</gpu_type>
<gpu_device_num>1</gpu_device_num>
<gpu_opencl_dev_index>1</gpu_opencl_dev_index>
<ncpus>1.000000</ncpus>
<rsc_fpops_est>26106628485363.000000</rsc_fpops_est>
<rsc_fpops_bound>522132569707260.000000</rsc_fpops_bound>
<rsc_memory_bound>78643200.000000</rsc_memory_bound>
<rsc_disk_bound>50000000.000000</rsc_disk_bound>
<computation_deadline>1353443811.000000</computation_deadline>
<vbox_window>0</vbox_window>
</app_init_data>
INFO: gpu_type set in init_data.xml to ATI
INFO: gpu_device_num set in init_data.xml to 1
Boinc requested ATI gpu device number1
Unzipping input images ../../projects/www.worldcommunitygrid.org/X0960075380493200610061842_X0960075380493200610061842.zip
Processing jobdescription
Number of Images defined in image list is 2
Found compute platform Advanced Micro Devices, Inc.
Selecting this platform
CL_DEVICE_NAME: Cypress
CL_DEVICE_VENDOR: Advanced Micro Devices, Inc.
CL_DEVICE_VERSION: 1016.4 (VM)
CL_DEVICE_MAX_COMPUTE_UNITS: 
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 256 / 256 / 256
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256
CL_DEVICE_MAX_CLOCK_FREQUENCY: 725 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 512 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 1024 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 32 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_EXTENSIONS:
cl_khr_fp64
cl_amd_fp64
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_3d_image_writes
cl_khr_byte_addressable_store
cl_khr_gl_sharing
cl_ext_atomic_counters_32
cl_amd_device_attribute_query
cl_amd_vec3
cl_amd_printf
cl_amd_media_ops
cl_amd_popcnt
cl_khr_d3d10_sharing
cl_khr_dx9_media_sharing
Estimated kernel execution time = 0.33089 [sec]
Starting analysis of X0960075380493200610061842.jp2...
Extracting GLCM features...
Total kernel time: 172.255692 (1026 kernel executions)
Total memory transfer time: 82.907997
Average kernel time: 0.167891
Min kernel time: 0.156787 (dx=23 dy=11 sample_dist=24 )
Max kernel time: 0.179105 dx=2 dy=0 sample_dist=1
INFO: GPU calculations complete.
Total time for X0960075380493200610061842.jp2: 567 seconds
Finished Image #0, pctComplete = 0.500000
Starting analysis of X0960075380943200610061835.jp2...
Extracting GLCM features...

----------------------------------------

Crunching in memory of my Mum PEGGY, cousin ROPPA and Aunt AUDREY.
[Nov 14, 2012 6:17:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
coolstream
Senior Cruncher
SCOTLAND
Joined: Nov 8, 2005
Post Count: 475
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU units sticking

What driver version are you running?


nanoprobe, I'm using ATI 12.10 on BOINC 7.0.36

Try upgrading to driver 12.11beta4. I have seen the problem you're having before and the new 12.11beta4 driver seems to resolve it. There is also a 12.11beta6 out but I have not tried it so I would recommend you try the beta4 version first. The recommended installation procedure is to uninstall the 12.10 driver. Run a program called driver sweeper to remove any left over remnants and do a clean install of the 12.11beta4. Hopefully this will cure your problem.

Thanks, I'll give that a try.

UPDATE: I updated one machine and left the other while I slept for a few hours in front of the TV (It's been exhausting sheperding these two for the past 24 hours).

The machine updated to the beta GPU driver now appears to be running without issue for approx eight hours, whilst the other has the majority stuck with outrageous runtimes. It's hardly the most scientific of tests, but enough to convince me that you could well have found a solution for me. I am therefore now in the process of updating the second machine.

Thanks for taking the time to help, nanoprobe!
----------------------------------------

Crunching in memory of my Mum PEGGY, cousin ROPPA and Aunt AUDREY.
----------------------------------------
[Edit 1 times, last edit by coolstream at Nov 15, 2012 7:09:40 AM]
[Nov 14, 2012 6:24:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 14   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread