Pages

Thursday, November 21, 2019

Novell OES 11.1 shell commands hang without reason

Novell OES 11 linux shell commands hang for a very long time. It happened to me and I was able to fix it without restarting the whole server.

Server release info:
Novell Open Enterprise Server 11 (x86_64)
VERSION = 11.1
PATCHLEVEL = 1
Symptoms:
# strace -p26881
Process 26881 attached - interrupt to quit
connect(146, {sa_family=AF_FILE, path="/var/run/novell-lum/.nam_nss_sock"}, 35^C <unfinished ...>
All http services that are not using linux authentiction are working - iManager works, iMonitor works, Remote Manager is not working because it is for controlling linux services and require linux authentication.

It seems that something went wrong with namcd service. Most linux commands (like cron, id, ssh...) are trying to check through that socket for a user FDN from eDirectory. Linux User Managment (LUM) maps linux users to eDirectory users and every linux program executed is asking namcd (eDirectory Novell Account Management caching daemon) for information about current user. If namcd is not working it will just use local linux user db for that and if is working correctly you can check every user in eDir like this:

Note: admin is not local linux user but eDir user

Working LUM and namcd:
# id admin
uid=602(admin) gid=602 groups=602,601(sms smdr group)
Not working LUM and/or namcd:
# id admin
id: admin: No such user
The problem is when namcd is working but not returning any data through that socket. Then you get every linux command that checks for current user to hang forever.
# id admin
(hangs forever until you press ctrl+c)
Solution is to kill all hanged processes and then restart namcd
# rcnamcd restart
I was able to login via ssh in a strange way - it asks me for password and then hanged and I left it like this and after an hour I pressed ctrl+c and it showed me the desired shell on the remote server.

If you do this and it still does not work properly and you get messages like this in /var/log/messages
Nov 22 10:19:45 storage /usr/sbin/namcd[720]:  GetGIDsGroupListNumberOfGroupsOfWS: Error [32] in LDAP search while trying to find group FDNs with scope=base for cn=UNIX Workstation - storage,o=servers
You need to recreate nam.conf. For more information look here: http://geroyblog.blogspot.com/2013/04/novell-enterprise-linux-server-install.html

No comments: