NVIDIA/Troubleshooting

From ArchWiki


Failure to start

System will not boot after driver was installed

If after installing the NVIDIA driver your system becomes stuck before reaching the display manager, try to disable kernel mode setting.

Xorg fails to load or Red Screen of Death

If you get a red screen and use GRUB, disable the GRUB framebuffer by editing /etc/default/grub and uncomment GRUB_TERMINAL_OUTPUT=console. For more information see GRUB/Tips and tricks#Disable framebuffer.

Black screen at X startup / Machine poweroff at X shutdown

If you have installed an update of NVIDIA and your screen stays black after launching Xorg, or if shutting down Xorg causes a machine poweroff, try the below workarounds:

# modprobe nvidia

Screen(s) found, but none have a usable configuration

Sometimes NVIDIA and X have trouble finding the active screen. If your graphics card has multiple outputs try plugging your monitor into the other ones. On a laptop it may be because your graphics card has VGA/TV out. Xorg.0.log will provide more info.

Another thing to try is adding an invalid Option "ConnectedMonitor" to Section "Device" to force Xorg throw an error and show you how to correct it. See the documentation for more information about the ConnectedMonitor setting.

After re-run X see Xorg.0.log to get valid CRT-x,DFP-x,TV-x values.

nvidia-xconfig --query-gpu-info could be helpful.

X fails with "Failing initialization of X screen"

If /var/log/Xorg.0.log says X server fails to initialize screen

(EE) NVIDIA(G0): GPU screens are not yet supported by the NVIDIA driver
(EE) NVIDIA(G0): Failing initialization of X screen

and nvidia-smi says No running processes found

The solution is at first reinstall latest nvidia-utils, and then copy /usr/share/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf to /etc/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf, and then edit /etc/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf and add the line Option "PrimaryGPU" "yes". Restart the computer. The problem will be fixed.

Xorg fails during boot, but otherwise starts fine

On very fast booting systems, systemd may attempt to start the display manager before the NVIDIA driver has fully initialized. You will see a message like the following in your logs only when Xorg runs during boot.

/var/log/Xorg.0.log
[     1.807] (EE) NVIDIA(0): Failed to initialize the NVIDIA kernel module. Please see the
[     1.807] (EE) NVIDIA(0):     system's kernel log for additional error messages and
[     1.808] (EE) NVIDIA(0):     consult the NVIDIA README for details.
[     1.808] (EE) NVIDIA(0):  *** Aborting ***

In this case you will need to establish an ordering dependency from the display manager to the DRI device. First create device units for DRI devices by creating a new udev rules file.

/etc/udev/rules.d/99-systemd-dri-devices.rules
ACTION=="add", KERNEL=="card*", SUBSYSTEM=="drm", TAG+="systemd"

Then create dependencies from the display manager to the device(s).

/etc/systemd/system/display-manager.service.d/10-wait-for-dri-devices.conf
[Unit]
Wants=dev-dri-card0.device
After=dev-dri-card0.device

If you have additional cards needed for the desktop then list them in Wants and After seperated by spaces.

Black screen on systems with integrated GPU

If you have a system with an integrated GPU (e.g. Intel HD 4000, VIA VX820 Chrome 9 or AMD Cezanne) and have installed the nvidia package, you may experience a black screen on boot, when changing virtual terminal, or when exiting an X session. This may be caused by a conflict between the graphics modules. This is solved by blacklisting the relevant GPU modules. Create the file /etc/modprobe.d/blacklist.conf and prevent the relevant modules from loading on boot:

/etc/modprobe.d/blacklist.conf
install i915 /usr/bin/false
install intel_agp /usr/bin/false
install viafb /usr/bin/false
install radeon /usr/bin/false
install amdgpu /usr/bin/false

X fails with "no screens found" when using Multiple GPUs

In situations where you might have multiple GPUs on a system and X fails to start with:

[ 76.633] (EE) No devices detected.
[ 76.633] Fatal server error:
[ 76.633] no screens found

then you need to add your discrete card's BusID to your X configuration. This can happen on systems with an Intel CPU and an integrated GPU or if you have more than one NVIDIA card connected. Find your BusID:

# lspci | grep -E "VGA|3D controller"
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
01:00.0 VGA compatible controller: NVIDIA Corporation GK107 [GeForce GTX 650] (rev a1)
08:00.0 3D controller: NVIDIA Corporation GM108GLM [Quadro K620M / Quadro M500M] (rev a2)

Then you fix it by adding it to the card's Device section in your X configuration. In my case:

/etc/X11/xorg.conf.d/10-nvidia.conf
Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:1:0:0"
EndSection
Note: BusID formatting is important!

In the example above 01:00.0 is stripped to be written as 1:0:0, however some conversions can be more complicated. lspci output is in hex format, but in configuration files the BusID's are in decimal format! This means that in cases where the BusID is greater than 9 you will need to convert it to decimal!

ie: 5e:00.0 from lspci becomes PCI:94:0:0.

Modprobe Error: "Could not insert 'nvidia': No such device" on linux >=4.8

With linux 4.8, one can get the following errors when trying to use the discrete card:

$ modprobe nvidia -vv
modprobe: INFO: custom logging function 0x409c10 registered
modprobe: INFO: Failed to insert module '/lib/modules/4.8.6-1-ARCH/extramodules/nvidia.ko.gz': No such device
modprobe: ERROR: could not insert 'nvidia': No such device
modprobe: INFO: context 0x24481e0 released
insmod /lib/modules/4.8.6-1-ARCH/extramodules/nvidia.ko.gz 
# dmesg
...
NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:139b)
NVRM: installed in this system is not supported by the 370.28
NVRM: NVIDIA Linux driver release.  Please see 'Appendix
NVRM: A - Supported NVIDIA GPU Products' in this release's
NVRM: README, available on the Linux driver download page
NVRM: at www.nvidia.com.
...

This problem is caused by bad commits pertaining to PCIe power management in the Linux Kernel (as documented in this NVIDIA DevTalk thread).

The workaround is to add pcie_port_pm=off to your kernel parameters. Note that this disables PCIe power management for all devices.

System does not return from suspend

What you see in the log:

kernel: nvidia-modeset: ERROR: GPU:0: Failed detecting connected display devices
kernel: nvidia-modeset: ERROR: GPU:0: Failed detecting connected display devices
kernel: nvidia-modeset: WARNING: GPU:0: Failure processing EDID for display device DELL U2412M (DP-0).
kernel: nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DELL U2412M (DP-0)
kernel: nvidia-modeset: ERROR: GPU:0: Failure reading maximum pixel clock value for display device DELL U2412M (DP-0).

A possible solution based on [1]:

Run this command to get the version string:

# strings /sys/firmware/acpi/tables/DSDT | grep -i 'windows ' | sort | tail -1

Add the acpi_osi=! "acpi_osi=version" kernel parameter to your boot loader configuration.

Another possible cause to the issue could be the use of the nvidia-open package, as described here:

Black screen returning from suspend

If experiencing black screen issues and logs containing:

archlinux kernel: NVRM: GPU at PCI:0000:08:00: GPU-926ecdb0-adb1-6ee9-2fad-52e7214c5011
archlinux kernel: NVRM: Xid (PCI:0000:08:00): 13, pid='<unknown>', name=<unknown>, Graphi>
archlinux kernel: NVRM: Xid (PCI:0000:08:00): 13, pid='<unknown>', name=<unknown>, Graphi>
archlinux kernel: NVRM: Xid (PCI:0000:08:00): 13, pid='<unknown>', name=<unknown>, Graphi>
archlinux kernel: NVRM: Xid (PCI:0000:08:00): 13, pid='<unknown>', name=<unknown>, Graphi>
archlinux kernel: NVRM: Xid (PCI:0000:08:00): 13, pid='<unknown>', name=<unknown>, Graphi>

You need to enable the NVIDIA suspend, hibernate and sleep services as explained in NVIDIA/Tips and tricks#Preserve video memory after suspend.

Crashes and hangs

Crashing in general

  • Try disabling RenderAccel in xorg.conf.
  • If Xorg outputs an error about "conflicting memory type" or "failed to allocate primary buffer: out of memory", or crashes with a "Signal 11" while using nvidia-96xx drivers, add nopat to your kernel parameters.
  • If the NVIDIA compiler complains about different versions of GCC between the current one and the one used for compiling the kernel, add in /etc/profile:
export IGNORE_CC_MISMATCH=1
  • If fullscreen applications are freezing or crashing, try enabling Display Compositing and Direct fullscreen rendering options in your desktop environment's settings.

Visual glitches, hangs and errors in OpenGL applications

If you are using a recent CPU (Intel Sandy Bridge (2011) and later or AMD Zen (2017) and later) it has a micro operations cache. Using a micro op cache can lead to problems with NVIDIA's driver in OpenGL due to Cache Aliasing [2]. You usually are able to disable the micro op cache in your systems BIOS, but this comes at the cost of performance [3]. Disabling the micro op cache also helps with the most severe graphical glitches in Xwayland applications, although it does not solve the problem fully [4].

Kernel panic when updating and/or rebooting the system

This is a known bug that is present in the NVIDIA 550 series drivers. [5] As of yet the cause is unknown however it only appears to affect laptops. See BBS#293400 for more details.

To workaround this issue, switch to nvidia-open-dkms if supported by the hardware, otherwise use nvidia-535xx-dkmsAUR instead.

Visual issues

Avoid screen tearing

Note:
  • This has been reported to reduce the performance of some OpenGL applications and may produce issues in WebGL. It also drastically increases the time the driver needs to clock down after load (NVIDIA Support Thread).
  • ForceFullCompositionPipeline is known to break some games using Vulkan under Proton with NVIDIA driver 535.

Tearing can be avoided by forcing a full composition pipeline, regardless of the compositor you are using. To test whether this option will work, run:

$ nvidia-settings --assign CurrentMetaMode="nvidia-auto-select +0+0 { ForceFullCompositionPipeline = On }"

Or click on the Advanced button that is available on the X Server Display Configuration menu option. Select either Force Composition Pipeline or Force Full Composition Pipeline and click on Apply.

In order to make the change permanent, it must be added to the "Screen" section of the Xorg configuration file. When making this change, TripleBuffering should be enabled and AllowIndirectGLXProtocol should be disabled in the driver configuration as well. See example configuration below:

/etc/X11/xorg.conf.d/20-nvidia.conf
Section "Device"
        Identifier "NVIDIA Card"
        Driver     "nvidia"
        VendorName "NVIDIA Corporation"
        BoardName  "GeForce GTX 1050 Ti"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    Option         "ForceFullCompositionPipeline" "on"
    Option         "AllowIndirectGLXProtocol" "off"
    Option         "TripleBuffer" "on"
EndSection

If you do not have an Xorg configuration file, you can create one for your present hardware using nvidia-xconfig (see NVIDIA#Automatic configuration) and move it from /etc/X11/xorg.conf to the preferred location /etc/X11/xorg.conf.d/20-nvidia.conf.

Note: Many of the configuration options produced in 20-nvidia.conf by using nvidia-xconfig are set automatically by the driver and are not needed. To only use this file for enabling composition pipeline, only the section "Screen" containing lines with values for Identifier and Option are necessary. Other sections may be removed from this file.

Multi-monitor

For multi-monitor setup you will need to specify ForceCompositionPipeline=On for each display. For example:

$ nvidia-settings --assign CurrentMetaMode="DP-2: nvidia-auto-select +0+0 {ForceCompositionPipeline=On}, DP-4: nvidia-auto-select +3840+0 {ForceCompositionPipeline=On}"

Without doing this, the nvidia-settings command will disable your secondary display.

You can get the current screen names and offsets using --query:

$ nvidia-settings --query CurrentMetaMode

The above line is for two 3840x2160 monitors connected to DP-2 and DP-4. You will need to read the correct CurrentMetaMode by exporting xorg.conf and append ForceCompositionPipeline to each of your displays. Setting ForceCompositionPipeline only affects the targeted display.

Tip: Multi monitor setups using different model monitors may have slightly different refresh rates. If vsync is enabled by the driver it will sync to only one of these refresh rates which can cause the appearance of screen tearing on incorrectly synced monitors. Select to sync the display device which is the primarily used monitor as others will not sync properly. This is configurable in ~/.nvidia-settings-rc as 0/XVideoSyncToDisplayID= or by installing nvidia-settings and using the graphical configuration options.

Screen corruption after resuming from suspend or hibernation

This also applies if an external monitor does not wake up after suspend or hibernation.

See NVIDIA/Tips and tricks#Preserve video memory after suspend

A corruption after suspend bug when using GDM service was solved as of driver version 515.43.04 [6].

Corrupted screen: "Six screens" Problem

For some users, using GeForce GT 100M's, the screen gets corrupted after X starts, divided into 6 sections with a resolution limited to 640x480. The same problem has been reported with Quadro 2000 and hi-res displays.

To solve this problem, enable the Validation Mode NoTotalSizeCheck in section Device:

Section "Device"
 ...
 Option "ModeValidation" "NoTotalSizeCheck"
 ...
EndSection

Invisible text and icons with nvidia-470

An update of GTK4 brought an issue for users relying on the nvidia-470 driver for legacy cards. After the update text and icons randomly disappear and re-appear only after hovering with the mouse over the windows.[7]

See the forum for work-arounds.

Fix graphical corruption in GNOME Shell when resuming from sleep

If you are facing strange fonts and/or having weird graphical glitches in GNOME Shell when resuming from sleep, try setting the following kernel parameter to enable power management:

nvidia.NVreg_DynamicPowerManagement=0x02

More info: https://download.nvidia.com/XFree86/Linux-x86_64/560.35.03/README/dynamicpowermanagement.html

Performance issues

Bad performance after installing a new driver version

If FPS have dropped in comparison with older drivers, check if direct rendering is enabled (glxinfo is included in mesa-utils):

$ glxinfo | grep direct

If the command prints:

direct rendering: No

A possible solution could be to regress to the previously installed driver version and rebooting afterwards.

Extreme lag on Xorg

The factual accuracy of this article or section is disputed.

Reason: According to an NVIDIA developer this issue is not specific to GNOME and the rest of the comments on the issue do not mention multi-monitor setups. (Discuss in Talk:NVIDIA/Troubleshooting)

A common issue with Mutter is that animations, video playback and gaming cause extreme desktop lag on Xorg.

See NVIDIA/Tips and tricks#Preserve video memory after suspend.

This should resolve this issue, however if it did not, you are most likely out of luck. One way you can remedy this issue is by adding these options:

/etc/environment
CLUTTER_DEFAULT_FPS=YOUR_MAIN_DISPLAY_REFRESHRATE
__GL_SYNC_DISPLAY_DEVICE=YOUR_MAIN_DISPLAY_OUTPUT_NAME

turning Sync to VBlank and Allow flipping off within NVIDIA Settings, and configuring NVIDIA Settings to launch on startup using the flag --load-config-only. This will still result in a laggy desktop behavior, in particular on an eventual second (or third) monitor, but it should be much better.

CPU spikes with 400 series cards

If you are experiencing intermittent CPU spikes with a 400 series card, it may be caused by PowerMizer constantly changing the GPU's clock frequency. Switching PowerMizer's setting from Adaptive to Performance, add the following to the Device section of your Xorg configuration:

 Option "RegistryDwords" "PowerMizerEnable=0x1; PerfLevelSrc=0x3322; PowerMizerDefaultAC=0x1"

Other issues

Vulkan error on applications start

The factual accuracy of this article or section is disputed.

Reason: Need confirmation by other users (Discuss in Talk:NVIDIA/Troubleshooting)

On executing an application that require Vulkan acceleration, if you get this error

Vulkan call failed: -4

try to delete the ~/.nv or ~/.cache/nvidia directory.

No audio over HDMI

Sometimes NVIDIA HDMI audio devices are not shown when you do

$ aplay -l

On some new machines, the audio chip on the NVIDIA GPU is disabled at boot. Read more on NVIDIA's website and a forum post.

You need to reload the NVIDIA device with audio enabled. In order to do that make sure that your GPU is on (in case of laptops/Bumblebee) and that you are not running X on it, because it is going to reset:

# setpci -s 01:00.0 0x488.l=0x2000000:0x2000000
# rmmod nvidia-drm nvidia-modeset nvidia
# echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
# echo 1 > /sys/bus/pci/devices/0000:00:01.0/rescan
# modprobe nvidia-drm
# xinit -- -retro

If you are running your TTY on NVIDIA, put the lines in a script so you do not end up with no screen.

Backlight is not turning off in some occasions

By default, DPMS should turn off backlight with the timeouts set or by running xset. However, probably due to a bug in the proprietary NVIDIA drivers the result is a blank screen with no powersaving whatsoever. To workaround it, until the bug has been fixed you can use the vbetool as root.

Install the vbetool package.

Turn off your screen on demand and then by pressing a random key backlight turns on again:

vbetool dpms off && read -n1; vbetool dpms on

Alternatively, xrandr is able to disable and re-enable monitor outputs without requiring root.

xrandr --output DP-1 --off; read -n1; xrandr --output DP-1 --auto


HardDPMS

This article or section needs expansion.

Reason: Add references for the "user reports". (Discuss in Talk:NVIDIA/Troubleshooting)

Proprietary driver 415 includes a new feature called HardDPMS. This is reported by some users to solve the issues with suspending monitors connected over DisplayPort. It is enabled by default since 440.26. If you are using an older driver, the HardDPMS option can be set in the Device or Screen sections. For example:

/etc/X11/xorg.conf.d/20-nvidia.conf
Section "Device"
    ...
    Option         "HardDPMS" "true"    
    ...
EndSection

Section "Screen"
    ...
    Option         "HardDPMS" "true"
    ...
EndSection

HardDPMS will trigger on screensaver settings like BlankTime. The following ServerFlags will set your monitor(s) to suspend after 10 minutes of inactivity:

/etc/X11/xorg.conf.d/20-nvidia.conf
Section "ServerFlags"
    Option     "BlankTime" "10"
EndSection

xrandr BadMatch

If you are trying to configure a WQHD monitor such as DELL U2515H using xrandr and xrandr --addmode gives you the error X Error of failed request: BadMatch, it might be because the proprietary NVIDIA driver clips the pixel clock maximum frequency of HDMI output to 225 MHz or lower. To set the monitor to maximum resolution you have to install nouveau drivers. You can force nouveau to use a specific pixel clock frequency by setting nouveau.hdmimhz=297 (or 330) in your Kernel parameters.

Alternatively, it may be that your monitor's EDID is incorrect. See #Override EDID.

Another reason could be that by default current NVIDIA drivers will only allow modes explicitly reported by EDID, but sometimes refresh rates and/or resolutions are desired which are not reported by the monitor (although the EDID information is correct; it is just that current NVIDIA drivers are too restrictive).

If this happens, you may want to add an option to xorg.conf to allow non-EDID modes:

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
...
    Option         "ModeValidation" "AllowNonEdidModes"
...
EndSection

This can be set per-output. See README - Appendix B. X Config Options for more information.

Override EDID

See Kernel mode setting#Forcing modes and EDID, Xrandr#Troubleshooting and Qnix QX2710#Fixing X11 with Nvidia.

Overclocking with nvidia-settings GUI not working

This article or section needs language, wiki syntax or style improvements. See Help:Style for reference.

Reason: Duplication, vague "not working" (Discuss in Talk:NVIDIA/Troubleshooting)

Workaround is to use nvidia-settings CLI to query and set certain variables after enabling overclocking (as explained in NVIDIA/Tips and tricks#Enabling overclocking, see nvidia-settings(1) for more information).

Example to query all variables:

 nvidia-settings -q all

Example to set PowerMizerMode to prefer performance mode:

 nvidia-settings -a [gpu:0]/GPUPowerMizerMode=1

Example to set fan speed to fixed 21%:

nvidia-settings -a [gpu:0]/GPUFanControlState=1 -a [fan:0]/GPUTargetFanSpeed=21

Example to set multiple variables at once (overclock GPU by 50MHz, overclock video memory by 50MHz, increase GPU voltage by 100mV):

 nvidia-settings -a GPUGraphicsClockOffsetAllPerformanceLevels=50 -a GPUMemoryTransferRateOffsetGPUGraphicsClockOffsetAllPerformanceLevels=50 -a GPUOverVoltageOffset=100

Overclocking not working with Unknown Error

If you are running Xorg as a non-root user and trying to overclock your NVIDIA GPU, you will get an error similar to this one:

$ nvidia-settings -a "[gpu:0]/GPUGraphicsClockOffset[3]=10"
ERROR: Error assigning value 10 to attribute 'GPUGraphicsClockOffset' (trinity-zero:1[gpu:0]) as specified in assignment
        '[gpu:0]/GPUGraphicsClockOffset[3]=10' (Unknown Error).

To avoid this issue, Xorg has to be run as the root user. See Xorg#Rootless Xorg for details.

Power draw

This article or section needs expansion.

Reason: What is the point of this section? (Discuss in Talk:NVIDIA/Troubleshooting)

Check driver usage:

# lsof /dev/nvidia*
kwin_wayl  867      user   17u   CHR   195,0      0t0  418 /dev/nvidia
kwin_wayl  867      user   18u   CHR   195,0      0t0  418 /dev/nvidiactl

If power save is configured on the kernel module:

$ grep . /sys/bus/pci/devices/0000:01:00.0/power/*
/sys/bus/pci/devices/0000:01:00.0/power/control:auto
/sys/bus/pci/devices/0000:01:00.0/power/runtime_active_time:445933
/sys/bus/pci/devices/0000:01:00.0/power/runtime_status:active
/sys/bus/pci/devices/0000:01:00.0/power/runtime_suspended_time:1266
/sys/bus/pci/devices/0000:01:00.0/power/wakeup:disabled
# rmmod nvidia_drm
$ grep . /sys/bus/pci/devices/0000:01:00.0/power/*
/sys/bus/pci/devices/0000:01:00.0/power/control:auto
/sys/bus/pci/devices/0000:01:00.0/power/runtime_active_time:461023
/sys/bus/pci/devices/0000:01:00.0/power/runtime_status:suspended
/sys/bus/pci/devices/0000:01:00.0/power/runtime_suspended_time:1064192
/sys/bus/pci/devices/0000:01:00.0/power/wakeup:disabled

Test software GL

The binary NVIDIA driver will not adhere to the Mesa environment variable LIBGL_ALWAYS_SOFTWARE=1 but you can direct libglvnd and EGL to use Mesa by setting the following environment variables:

__GLX_VENDOR_LIBRARY_NAME=mesa
__EGL_VENDOR_LIBRARY_FILENAMES=/usr/share/glvnd/egl_vendor.d/50_mesa.json

which will result in the Mesa libgl being used for GLX and EGL and result in software GL to see whether a bug is related to the NVIDIA GL library.