Linux Debug Tricks

From Renesas.info
Revision as of 14:25, 8 September 2023 by Seebe (talk | contribs) (Added initcall_debug)

đŸŽ©đŸ‡đŸƒđŸ”ź

This page contains many helpful suggestions for debugging issues with the Linux kernel.


Common Tools

  • grep : The kernel has thousands of files. The grep tool is the most powerful way to find what you are looking for. Use the -R to search recursively . Remember that grep is case sensitive by default, so use -i if you need it to ignore case.
    • And example to find a function name start_kernel: $ grep -r "start_kernel"
  • find : Sometimes someone will give you just a filename, and the find command can help you find where that file is. Also, it find is very helpful when combined with grep.
    • For example, if you just want to search header files: $ find * -name "*.h" | grep purple
  • printf / printk : Believe it or not, printf and printk are still the most used debug methods for kernel developers. It's simple, and works on any system. For the kernel, the printf function is called printk. But if you are debugging something like u-boot, you will the standard printf

Error Messages

  • Many times an error message will be printed out. This happens a lot for a device driver that has trouble initializing during boot.
  • Your first step should be to find what source code file is print that message
  • Simply use the grep command with -R option and search for that error message.

Error Codes

  • If a error message prints out an error code, for example "-110", you should look up what the error code means.
  • You can find a list of the error code numbers mean in file include/uapi/asm-generic/errno-base.h
  • Since the code will use the #defrine name (not the actual number), you can then search driver file for when that error code/name is returned.

Finding the Device Driver File

  • When adding or configuring a driver to your system, you will probably be editing the Device Tree.
  • If there is an issue with that driver or peripheral, you will need to find the source code for the driver
  • You can use the "compatible" name listed in the device tree, and grep, in order to find the location of the device driver file.
  • Do you search starting in the "drivers" directory (to avoid all the matches that will be found in other device tree files)
  • Example: Find the Watchdog Timer Driver
    • RZ/G2L Device Tree: rz_linux-cip/arch/arm64/boot/dts/renesas/r9a07g044.dtsi
      wdt0: watchdog@12800800 {
      	compatible = "renesas,r9a07g044-wdt",
      		     "renesas,rzg2l-wdt";
      	reg = <0 0x12800800 0 0x400>;
      	clocks = <&cpg CPG_MOD R9A07G044_WDT0_PCLK>,
      		 <&cpg CPG_MOD R9A07G044_WDT0_CLK>;
      	clock-names = "pclk", "oscclk";
      	interrupts = <GIC_SPI 49 IRQ_TYPE_LEVEL_HIGH>,
      		     <GIC_SPI 50 IRQ_TYPE_LEVEL_HIGH>;
      	interrupt-names = "wdt", "perrout";
      	resets = <&cpg R9A07G044_WDT0_PRESETN>;
      	power-domains = <&cpg>;
      	status = "disabled";
      };
      
    • Use the grep command:
      $ grep -R "renesas,rzg2l-wdt" *
      arch/arm64/boot/dts/renesas/r9a07g044.dtsi:				     "renesas,rzg2l-wdt";
      arch/arm64/boot/dts/renesas/r9a07g044.dtsi:				     "renesas,rzg2l-wdt";
      arch/arm64/boot/dts/renesas/r9a07g044.dtsi:				     "renesas,rzg2l-wdt";
      arch/arm64/boot/dts/renesas/r9a07g043.dtsi:				     "renesas,rzg2l-wdt";
      arch/arm64/boot/dts/renesas/r9a07g043.dtsi:				     "renesas,rzg2l-wdt";
      arch/arm64/boot/dts/renesas/r9a07g054.dtsi:				     "renesas,rzg2l-wdt";
      arch/arm64/boot/dts/renesas/r9a07g054.dtsi:				     "renesas,rzg2l-wdt";
      arch/arm64/boot/dts/renesas/r9a07g054.dtsi:				     "renesas,rzg2l-wdt";
      Documentation/devicetree/bindings/watchdog/renesas,wdt.yaml:          - const: renesas,rzg2l-wdt     # RZ/G2L
      Documentation/devicetree/bindings/watchdog/renesas,wdt.yaml:              - renesas,rzg2l-wdt
      drivers/watchdog/rzg2l_wdt.c:	{ .compatible = "renesas,rzg2l-wdt", },
      scripts/dtc/include-prefixes/arm64/renesas/r9a07g044.dtsi:				     "renesas,rzg2l-wdt";
      scripts/dtc/include-prefixes/arm64/renesas/r9a07g044.dtsi:				     "renesas,rzg2l-wdt";
      scripts/dtc/include-prefixes/arm64/renesas/r9a07g044.dtsi:				     "renesas,rzg2l-wdt";
      scripts/dtc/include-prefixes/arm64/renesas/r9a07g043.dtsi:				     "renesas,rzg2l-wdt";
      scripts/dtc/include-prefixes/arm64/renesas/r9a07g043.dtsi:				     "renesas,rzg2l-wdt";
      scripts/dtc/include-prefixes/arm64/renesas/r9a07g054.dtsi:				     "renesas,rzg2l-wdt";
      scripts/dtc/include-prefixes/arm64/renesas/r9a07g054.dtsi:				     "renesas,rzg2l-wdt";
      scripts/dtc/include-prefixes/arm64/renesas/r9a07g054.dtsi:				     "renesas,rzg2l-wdt";
      
    • The very bottom of file drivers/watchdog/rzg2l_wdt.c
      static const struct of_device_id rzg2l_wdt_ids[] = {
      	{ .compatible = "renesas,rzg2l-wdt", },
      	{ /* sentinel */ }
      };
      MODULE_DEVICE_TABLE(of, rzg2l_wdt_ids);
      
      static struct platform_driver rzg2l_wdt_driver = {
      	.driver = {
      		.name = "rzg2l_wdt",
      		.of_match_table = rzg2l_wdt_ids,
      	},
      	.probe = rzg2l_wdt_probe,
      };
      module_platform_driver(rzg2l_wdt_driver);
      
      MODULE_DESCRIPTION("Renesas RZ/G2L WDT Watchdog Driver");
      MODULE_AUTHOR("Biju Das ");
      MODULE_LICENSE("GPL v2");
      


View Interrupts

  • The 'virtual' file /proc/interrupts lists every interrupt in the system, and counts every time an that interrupt occurs.
  • Use this command to print it out
$ cat /proc/interrupts
  • When a device is not working, first check if interrupts are occurring and that will tell you a lot. This is usually the first step in any debugging.
  • Sometime device operations will fail silently (no error message), but if you see an interrupt occurred, you can at least know something has happened.
  • For example, maybe you are trying to detect an I2C device on boot up, but you do not see any message for it. If you check for I2C controller interrupts, then you at least know the I2C works, but maybe it's not using the correct I2C address. This means there is no "error" to report (other than you make a mistake in your Device Tree).


If there are no Error Messages

  • Sometimes you get clues that the driver might be doing something (interrupts), but not really sure what or why. In that case, you can put some print message in the API calls into the driver.

For example, say that the system was saying it was playing audio, but nothing was coming out. Basically if aplay works and you don't see anything, then it's 1 of 2 issues:

1. The pins are not really configured correctly

2. You have the channel muted so the alsa subsystem doesn't even bother sending data to the driver.

The driver is sound/soc/sh/rz-rssi.c

The way alsa drivers work is, first configures the audio for the correct sample rate, then it breaks the data up into blocks and feeds them one at a time. Basically, here are the APIs:

 
static const struct snd_soc_dai_ops rz_ssi_dai_ops = {
        .trigger        = rz_ssi_dai_trigger,
        .set_fmt        = rz_ssi_dai_set_fmt,
        .hw_params        = rz_ssi_dai_hw_params,
};

The function rz_ssi_dai_trigger() is the one that the audio subsystems calls over an over again with blocks of data to send out.

By putting a printk in that function to see if it is really getting called or not.

That will at least point you in a direction to look next.

How to read a Crash Log

  • xxx

Force a Stack Trace Dump

  • There are many ‘soft pointers’ in the kernel, so sometimes you can’t figure how you got into a function. You get a warning message, and you know what file printed the message, but you cannot figure out "who" called that function.
  • The WARN_ON(1) macro defined by the kernel will cause a stack dump so you can see who called that function. Since it is a Warning macro it will not cause the system to crash. The system ill continue to operate.
  • You can also put some condition in WARN_ON. For example: WARN_ON( my_flag == 1);
  • There is also a WARN_ONCE( ) macro that will make the stack dump only print once.
  • There is an example where there is an argument passed to function rzg2l_mip_dsi_attached() that causes an error (maybe from a bad Device Tree value). But, you need to know what function called it. The issue is that is determined at run-time, not compiled time. So by using the WARN_ON macro, you can get the call trace. Since the next step would be to print out some debug info (using printk) in the calling function, the WARN_ON(1) will print out the call trace so you will know what function that was. Then, you can use 'grep' to find what file has that function.

[    2.039546] Call trace:
[    2.041986]  rzg2l_mipi_dsi_attach+0x5c/0x244  <<< Where we put our WARN_ON
[    2.046336]  drm_bridge_attach+0x64/0xc4       <<< Where we need to look next for the bad args 
[    2.050254]  rcar_du_encoder_init+0xac/0x140
[    2.054514]  rcar_du_modeset_init+0x3b4/0x4ac
[    2.058861]  rcar_du_probe+0xb0/0x16c
[    2.062514]  platform_drv_probe+0x50/0xa0
[    2.066515]  really_probe+0x260/0x3d0
[    2.070169]  driver_probe_device+0x54/0xf0     <<< Means this is all part of the initial driver loading at boot time
[    2.074256]  __device_attach_driver+0xc0/0x110
[    2.078689]  bus_for_each_drv+0x78/0xd0
[    2.082516]  __device_attach+0xd8/0x174
[    2.086344]  device_initial_probe+0x10/0x20
[    2.090517]  bus_probe_device+0x90/0xa0
[    2.094344]  deferred_probe_work_func+0x6c/0xa0
[    2.098867]  process_one_work+0x1b8/0x304
[    2.102867]  worker_thread+0x23c/0x44c
[    2.106607]  kthread+0x128/0x160
[    2.109829]  ret_from_fork+0x10/0x1c

Bind and Unbind Drivers

  • If you want to completely reset a peripheral, you can 'unbind' the device from the driver, then 'bind' it back again.
  • When you bind, it is the same operation that happened during system boot (setup up the register, reads the settings from the Device Tree, looks for attached devices, etc...)
  • Here is an old article by Greg K-H, but still relevant today: https://lwn.net/Articles/143397/

Use initcall_debug to Confirm Driver is Loaded

  • When you are not seeing your device show up in the system, you should check your boot log to see if there are any error messages from the driver.
  • If you do not see any message at all from the driver, you might want to make sure your driver is being loaded. To do that, you will want to add initcall_debug to the kernel command line.
  • For example, in u-boot:
=> setenv bootargs 'root=/dev/mmcblk1p2 rootwait initcall_debug'
=> booti 0x48080000 - 0x48000000'
  • The initcall_debug option will make the kernel print out every driver it has loaded in the system. Expect to see a lot of text. However, you can then check that log to make sure your driver was loaded.
  • If you do not see your driver being loaded, it might not be enabled in the kernel (menuconfig).
  • Remember, during kernel boot, the kernel initializes each driver one at a time. And after each driver initialization, the kernel then checks the Device Tree to see if any devices are declared. That is what the "compatible" string is used for.....drivers and devices are matched up by string name. If a device 'compatible' name matches a driver 'compatible' name, then the kernel will immediately call the "probe" function inside the driver code in order to try and detect, set up and configure that driver.