分级代码报错调试记录

1.AttributeError: module pytorch_lightning has no attribute LightningDataModule

查了很多都没有,后来发现下载一个新lightning版本就好了:

pip install pytorch-lightning==1.5.10

2.ValueError: Function has keyword-only parameters or annotations, use inspect.signature() API which can support them

File "/project/xuling/Glioma-Seg-and-Det/methods/utils/utils.py", line 13, in initialize_class class_args = inspect.getargspec(class_name.__init__).args[1:] File "/home/xuling/project/anaconda3/envs/torch/lib/python3.9/inspect.py", line 1122, in getargspec raise ValueError("Function has keyword-only parameters or annotations" ValueError: Function has keyword-only parameters or annotations, use inspect.signature() API which can support them

把File "/project/xuling/Glioma-Seg-and-Det/methods/utils/utils.py", line 13的 inspect.getargspec改成 就好了。如果改成错误提示的inspect.signature还是会报错。

3.RuntimeError: CUDA error: device-side assert triggered

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. /tmp/slurmd/job1258033/slurm_script: line 10: 20993 Aborted (core dumped) python train.py
/opt/conda/conda-bld/pytorch_1646755849709/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [16,0,0], thread: [19,0,0] Assertion `input_val >= zero && input_val <= one` failed.

这两个是运行同一个代码的时候的报错,事实上都是因为越界,仔细研究后发现是因为有一个数据在预处理后出现NAN的情况,导致输入网络中越界,删掉该数据后就解决了。

参考:

经验分享 程序员 微信小程序 职场和发展