Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 3
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1239 times and has 2 replies Next Thread
yoro42
Ace Cruncher
United States
Joined: Feb 19, 2011
Post Count: 8976
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Two Frozen Applications: hcc1 7.05 (ati_hcc1)

Two WU tied up two GPU processors for 6.5 hours each. I aborted both jobs after collecting data which follows.

The applications are flowing again and finishing in about 8 minutes.

=BOINC MGR Event Log
Computer: Coltrane

3/16/2013 3:11:55 PM | | Starting BOINC client version 7.0.28 for windows_x86_64
3/16/2013 3:11:55 PM | | log flags: file_xfer, sched_ops, task
3/16/2013 3:11:55 PM | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6
3/16/2013 3:11:55 PM | | Data directory: C:\ProgramData\BOINC
3/16/2013 3:11:55 PM | | Running under account Jazzman
3/16/2013 3:11:55 PM | | Processor: 12 GenuineIntel Intel(R) Core(TM) i7 CPU X 980 @ 3.33GHz [Family 6 Model 44 Stepping 2]
3/16/2013 3:11:55 PM | | Processor: 256.00 KB cache
3/16/2013 3:11:55 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx tm2 popcnt aes pbe
3/16/2013 3:11:55 PM | | OS: Microsoft Windows 7: Ultimate x64 Edition, Service Pack 1, (06.01.7601.00)
3/16/2013 3:11:55 PM | | Memory: 15.99 GB physical, 21.99 GB virtual
3/16/2013 3:11:55 PM | | Disk: 922.76 GB total, 715.68 GB free
3/16/2013 3:11:55 PM | | Local time is UTC -7 hours
3/16/2013 3:11:55 PM | | VirtualBox version: 4.1.18
3/16/2013 3:11:55 PM | | ATI GPU 0: Cypress (CAL version 1.4.1741, 1024MB, 991MB available, 4640 GFLOPS peak)
3/16/2013 3:11:55 PM | | ATI GPU 1: Cypress (CAL version 1.4.1741, 1024MB, 991MB available, 4640 GFLOPS peak)
3/16/2013 3:11:55 PM | | OpenCL: ATI GPU 0: Cypress (driver version 1124.2 (VM), device version OpenCL 1.2 AMD-APP (1124.2), 1024MB, 991MB available)
3/16/2013 3:11:55 PM | | OpenCL: ATI GPU 1: Cypress (driver version 1124.2 (VM), device version OpenCL 1.2 AMD-APP (1124.2), 1024MB, 991MB available)
3/16/2013 3:11:55 PM | | Config: report completed tasks immediately
3/16/2013 3:11:55 PM | | Config: GUI RPC allowed from:
3/16/2013 3:11:55 PM | | Config: 192.168.0.2
3/16/2013 3:11:55 PM | | Config: 192.168.0.3
3/16/2013 3:11:55 PM | | Config: 192.168.0.4
3/16/2013 3:11:55 PM | | Config: 192.168.0.5
3/16/2013 3:11:55 PM | | Config: 192.168.0.6
3/16/2013 3:11:55 PM | Test4Theory@Home | URL http://lhcathome2.cern.ch/test4theory/; Computer ID 22428; resource share 0
3/16/2013 3:11:55 PM | LHC@home 1.0 | URL http://lhcathomeclassic.cern.ch/sixtrack/; Computer ID 9990150; resource share 0
3/16/2013 3:11:55 PM | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 6644160; resource share 0
3/16/2013 3:11:55 PM | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 1926675; resource share 100
3/16/2013 3:11:55 PM | World Community Grid | General prefs: from World Community Grid (last modified 14-Mar-2013 16:47:48)
3/16/2013 3:11:55 PM | World Community Grid | Computer location: work
3/16/2013 3:11:55 PM | | General prefs: using separate prefs for work
3/16/2013 3:11:55 PM | | Preferences:
3/16/2013 3:11:55 PM | | max memory usage when active: 12281.17MB
3/16/2013 3:11:55 PM | | max memory usage when idle: 14737.40MB
3/16/2013 3:11:55 PM | | max disk usage: 50.00GB
3/16/2013 3:11:55 PM | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
3/16/2013 3:11:55 PM | | Not using a proxy
3/16/2013 3:13:17 PM | | Suspending computation - initial delay


X0930124900141201101191138_ 2-- Coltrane Error 3/17/13 04:33:09 3/17/13 23:31:07 0.01 / 6.53 62.1 / 0.0
X0930124900157201101191137_ 2-- Coltrane Error 3/17/13 04:33:09 3/17/13 23:31:07 0.01 / 6.53 62.1 / 0.0
=================================================BOINC MGR Event Log
3/17/2013 9:56:29 AM | World Community Grid | Starting task X0930124900157201101191137_2 using hcc1 version 705 (ati_hcc1) in slot 0
3/17/2013 4:28:03 PM | World Community Grid | task X0930124900157201101191137_2 aborted by user
=================================================

Project World Community Grid

Name X0930124900157201101191137_2

Application hcc1 7.05 (ati_hcc1)
Workunit name X0930124900157201101191137
State Running
Received 3/16/2013 9:33:12 PM
Report deadline 3/19/2013 4:45:09 PM
Estimated app speed 45.92 GFLOPs/sec
Estimated task size 24,216 GFLOPs
Resources 1 CPUs + 1 ATI GPU (device 0)
CPU time at last checkpoint 00:00:00
CPU time 00:00:23
Elapsed time 06:29:50
Estimated time remaining --
Fraction done 0.000%
Virtual memory size 123.27 MB
Working set size 80.12 MB
Directory slots/0
Process ID 2700
=================================================
=BOINC MGR Event Log
3/17/2013 9:56:31 AM | World Community Grid | Starting task X0930124900141201101191138_2 using hcc1 version 705 (ati_hcc1) in slot 7
3/17/2013 4:28:15 PM | World Community Grid | task X0930124900141201101191138_2 aborted by user
=================================================
Project World Community Grid

Name X0930124900141201101191138_2

Application hcc1 7.05 (ati_hcc1)
Workunit name X0930124900141201101191138
State Running
Received 3/16/2013 9:33:12 PM
Report deadline 3/19/2013 4:45:09 PM
Estimated app speed 45.92 GFLOPs/sec
Estimated task size 24,216 GFLOPs
Resources 1 CPUs + 1 ATI GPU (device 1)
CPU time at last checkpoint 00:00:00
CPU time 00:00:22
Elapsed time 06:29:49
Estimated time remaining --
Fraction done 0.000%
Virtual memory size 121.13 MB
Working set size 78.53 MB
Directory slots/7
Process ID 1128


Result Log

Result Name: X0930124900141201101191138_ 2--
<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
aborted by user
</message>
<stderr_txt>
Commandline: projects/www.worldcommunitygrid.org/wcg_hcc1_img_7.05_windows_intelx86__ati_hcc1 --zipfile X0930124900141201101191138.zip --imagelist images.txt --device 1
<app_init_data>
<major_version>7</major_version>
<minor_version>0</minor_version>
<release>28</release>
<app_version>705</app_version>
<app_name>hcc1</app_name>
<project_preferences>


<color_scheme>Tahiti Sunset</color_scheme>
<max_frames_sec>2000</max_frames_sec>
<max_gfx_cpu_pct>100.0</max_gfx_cpu_pct>
</project_preferences>

<project_dir>C:\ProgramData\BOINC/projects/www.worldcommunitygrid.org</project_dir>
<boinc_dir>C:\ProgramData\BOINC</boinc_dir>
<wu_name>X0930124900141201101191138</wu_name>
<result_name>X0930124900141201101191138_2</result_name>
<comm_obj_name>boinc_5</comm_obj_name>
<slot>7</slot>
<wu_cpu_time>0.000000</wu_cpu_time>
<starting_elapsed_time>0.000000</starting_elapsed_time>
<using_sandbox>0</using_sandbox>
<user_total_credit>9086172.368075</user_total_credit>
<user_expavg_credit>17752.439470</user_expavg_credit>
<host_total_credit>4097885.026987</host_total_credit>
<host_expavg_credit>10258.968641</host_expavg_credit>
<resource_share_fraction>1.000000</resource_share_fraction>
<checkpoint_period>60.000000</checkpoint_period>
<fraction_done_start>0.000000</fraction_done_start>
<fraction_done_end>1.000000</fraction_done_end>
<gpu_type>ATI</gpu_type>
<gpu_device_num>1</gpu_device_num>
<gpu_opencl_dev_index>1</gpu_opencl_dev_index>
<ncpus>1.000000</ncpus>
<rsc_fpops_est>24215727646969.000000</rsc_fpops_est>
<rsc_fpops_bound>1210786382348450.000000</rsc_fpops_bound>
<rsc_memory_bound>78643200.000000</rsc_memory_bound>
<rsc_disk_bound>50000000.000000</rsc_disk_bound>
<computation_deadline>1363736529.000000</computation_deadline>
<vbox_window>0</vbox_window>
</app_init_data>
INFO: gpu_type set in init_data.xml to ATI
INFO: gpu_device_num set in init_data.xml to 1
Boinc requested ATI gpu device number1
Unzipping input images ../../projects/www.worldcommunitygrid.org/X0930124900141201101191138_X0930124900141201101191138.zip
Processing jobdescription
Number of Images defined in image list is 2
Found compute platform Advanced Micro Devices, Inc.
Selecting this platform
CL_DEVICE_NAME: Cypress
CL_DEVICE_VENDOR: Advanced Micro Devices, Inc.
CL_DEVICE_VERSION: 1124.2 (VM)
CL_DEVICE_MAX_COMPUTE_UNITS: 
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 256 / 256 / 256
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256
CL_DEVICE_MAX_CLOCK_FREQUENCY: 725 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 512 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 1024 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 32 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_EXTENSIONS:
cl_khr_fp64
cl_amd_fp64
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_3d_image_writes
cl_khr_byte_addressable_store
cl_khr_gl_sharing
cl_ext_atomic_counters_32
cl_amd_device_attribute_query
cl_amd_vec3
cl_amd_printf
cl_amd_media_ops
cl_amd_media_ops2
cl_amd_popcnt
cl_khr_d3d10_sharing
Estimated kernel execution time = 0.46595 [sec]
Starting analysis of X0930124900141201101191138.jp2...
Extracting GLCM features...

</stderr_txt>
]]>



Result Log

Result Name: X0930124900157201101191137_ 2--
<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
aborted by user
</message>
<stderr_txt>
Commandline: projects/www.worldcommunitygrid.org/wcg_hcc1_img_7.05_windows_intelx86__ati_hcc1 --zipfile X0930124900157201101191137.zip --imagelist images.txt --device 0
<app_init_data>
<major_version>7</major_version>
<minor_version>0</minor_version>
<release>28</release>
<app_version>705</app_version>
<app_name>hcc1</app_name>
<project_preferences>


<color_scheme>Tahiti Sunset</color_scheme>
<max_frames_sec>2000</max_frames_sec>
<max_gfx_cpu_pct>100.0</max_gfx_cpu_pct>
</project_preferences>

<project_dir>C:\ProgramData\BOINC/projects/www.worldcommunitygrid.org</project_dir>
<boinc_dir>C:\ProgramData\BOINC</boinc_dir>
<wu_name>X0930124900157201101191137</wu_name>
<result_name>X0930124900157201101191137_2</result_name>
<comm_obj_name>boinc_1</comm_obj_name>
<slot>0</slot>
<wu_cpu_time>0.000000</wu_cpu_time>
<starting_elapsed_time>0.000000</starting_elapsed_time>
<using_sandbox>0</using_sandbox>
<user_total_credit>9086172.368075</user_total_credit>
<user_expavg_credit>17752.439470</user_expavg_credit>
<host_total_credit>4097885.026987</host_total_credit>
<host_expavg_credit>10258.968641</host_expavg_credit>
<resource_share_fraction>1.000000</resource_share_fraction>
<checkpoint_period>60.000000</checkpoint_period>
<fraction_done_start>0.000000</fraction_done_start>
<fraction_done_end>1.000000</fraction_done_end>
<gpu_type>ATI</gpu_type>
<gpu_device_num>0</gpu_device_num>
<gpu_opencl_dev_index>0</gpu_opencl_dev_index>
<ncpus>1.000000</ncpus>
<rsc_fpops_est>24215727646969.000000</rsc_fpops_est>
<rsc_fpops_bound>1210786382348450.000000</rsc_fpops_bound>
<rsc_memory_bound>78643200.000000</rsc_memory_bound>
<rsc_disk_bound>50000000.000000</rsc_disk_bound>
<computation_deadline>1363736529.000000</computation_deadline>
<vbox_window>0</vbox_window>
</app_init_data>
INFO: gpu_type set in init_data.xml to ATI
INFO: gpu_device_num set in init_data.xml to 0
Boinc requested ATI gpu device number0
Unzipping input images ../../projects/www.worldcommunitygrid.org/X0930124900157201101191137_X0930124900157201101191137.zip
Processing jobdescription
Number of Images defined in image list is 2
Found compute platform Advanced Micro Devices, Inc.
Selecting this platform
CL_DEVICE_NAME: Cypress
CL_DEVICE_VENDOR: Advanced Micro Devices, Inc.
CL_DEVICE_VERSION: 1124.2 (VM)
CL_DEVICE_MAX_COMPUTE_UNITS: 
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 256 / 256 / 256
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256
CL_DEVICE_MAX_CLOCK_FREQUENCY: 725 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 512 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 1024 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 32 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_EXTENSIONS:
cl_khr_fp64
cl_amd_fp64
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_3d_image_writes
cl_khr_byte_addressable_store
cl_khr_gl_sharing
cl_ext_atomic_counters_32
cl_amd_device_attribute_query
cl_amd_vec3
cl_amd_printf
cl_amd_media_ops
cl_amd_media_ops2
cl_amd_popcnt
cl_khr_d3d10_sharing
Estimated kernel execution time = 0.37351 [sec]
Starting analysis of X0930124900157201101191137.jp2...
Extracting GLCM features...

</stderr_txt>
]]>
----------------------------------------

[Mar 18, 2013 12:55:50 AM]   Link   Report threatening or abusive post: please login first  Go to top 
RaymondFO
Veteran Cruncher
USA
Joined: Nov 30, 2004
Post Count: 561
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Two Frozen Applications: hcc1 7.05 (ati_hcc1)

This happens to me when the ATI video driver crashes and recovers. Except upon recovery, the GPU tasks never crunches again until you reboot the computer. If this occurs frequently, you may want to uninstall the driver and reinstall the driver. Please remember to reboot upon completing uninstalling the driver, and again rebooting upon completing the re-installation of the video driver so the new driver will be fully operational.
[Mar 18, 2013 2:59:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
captainjack
Advanced Cruncher
Joined: Apr 14, 2008
Post Count: 144
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Two Frozen Applications: hcc1 7.05 (ati_hcc1)

This happens to me when BOINC runs CPU benchmarks while a HCC GPU task is running. It happens to me about once a week. You should be able to look back through the Event Log Messages and see if CPU Benchmarks ran while the errant WU were running.

If I remember correctly, the WCG admins know about this and are looking into it. In the mean time, all we can do is abort the stuck tasks and start another one.
[Mar 18, 2013 3:12:28 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread