Why does PXE break every once in awhile?

Why does PXE break every once in awhile?
Vinh Huynh's picture

In our environment at Psomas, we have one DS 6.8SP2 that serves imaging jobs across the enterprise. This server also has the PXE Manager on it but not the other PXE services as there is a DHCP server with option 60 running thoses services. We've configured each office to have the same PXE services running on the DHCP server with option 60.

Imaging works fine most of the time. Just once in awhile, something makes the PXE Manager on the DS stop; or the PXE services on the DHCP servers stop functioning.

We would have to either restart services on DS and/or DHCP servers, reboot DS, or rebuild the PXE at the location that is failing. And things kick off again.

Keep in mind, when the PXE Manager is working, only one or two locations may be down, while others are imaging.

Has anyone have this problem, where PXE just intermittently stops?

Vinh H.

PXE Restarts

Vinh,

We have seen this behavior in our environment. Several different scenarios.

If we restart the main Altiris server that runs DS, PXE services on our field servers need to be restarted.

Occasionally users at a field office will complain that their systems are stopping at the PXE menu -we will then have to restart PXE there.

Task server to the rescue. I just created a task job that does a service control to stop the Pxe Config Helper service on a field server, uses a VB script to sleep for 5 seconds (a 1 line script: WScript.Sleep 5000 'sleep for 5 seconds), and then uses a service control to start the PXE Config Helper. Now we can restart the main server and run a script to restart PXE everywhere.

Hope this helps!

Pat

Use Windows Scheduler

Task Server is indeed a sexy solution.

But you could use a more sledgehammer-nut solution. Why not have a windows scheduler task to run ever x minutes to start the service?

If the service is already started, it will just error out. And if its stopped, it will start again.

Kind Regards,
Ian./

RE: PXE Broken

While these are two "work-arounds" to the issue. What could be the root cause? What troubleshooting steps should we go through to begin locating the root cause?

And again, why does PXE

Vinh Huynh's picture

And again, why does PXE even stops working? That's what I'm still wondering. It shouldn't require a scheduled restart.

Is anyone running DS 6.9. Is the new PXE in 6.9 any better?

Vinh Huynh

Troubleshooting Process crashes

Sadly, workarounds are often the fastest way to move forward. Full debugging takes time.

When we had lots of process crashes, we got debug files from the processes and sent them to Altiris Support. I haven't done this for PXE as until recently we had to to a nightly restart of the server to ensure Wake-On-Lan remained functional.

You can attach a debugger a process, and when it crashes out it will create a full dump for you.

Here is the Microsoft KB article for using ADPlus:
http://support.microsoft.com/kb/286350

Kind Regards,
Ian./

Scheduling restart will not work if PXE is not stopped.

Vinh Huynh's picture

Ian

The problem with your method is it might not help all the time.

Sometime the PXE service is still running. It's just stuck. You try to restart the service and it hangs on the stopping of the PXE. A server reboot is required when this happens.

Vinh Huynh

Taskkill?

This is just off the cuff, but if the service doesn't respond, it will return a specific error. You could look for this error in the script, and if found, kill the PXE server process using taskkill. A service start at that point then might work.

Kind Regards,
Ian./

Our PXE Always breaks too

Again, this happens for us. And we also have a workaround. (Why it doesn't work and Altiris doesn't just fix it is beyond me). We have a job in the Deployment Console that is run on the Deployment Server called "Restart PXE Service" which simply restarts the service. The technicians can run that job whenever they are trying to image a computer and PXE is out to lunch.

LAME LAME LAME LAME LAME if you ask me, but it's what we have to do to keep running.

PXE DS 6.9

After reading the altiris forums I noticed someone reported a memory leak with PXE Manager in DS prior to 6.9. This leak could be the root cause for some users. So, There are two possible ways to address this issues. 1. With DS create a job that will restart the PXE manager at times where PXE is not expected to work. I would do this once a week or depending on the number of clients you have. 2. DS 6.9 addresses this memory leak. So. try any of this and pls post your findings.

-Nelo

DS 6.9 too

We've got DS 6.9 running and I've seen the same kind of strange behavior.

We're not in full production mode yet, but even during our beginning stages of implementation I've noticed that occasionally I'll go to open the PXE Configuration Utility and it'll say there was some error, would you like to debug, and if you wait a bit it'll finally say that there's a problem with PXE services not running.

It seems that if I manually restart all of the PXE services on the server things are okay.

PXE Servers not showing up in PXE Manager

Since I have taken over Altiris management I have not been able to figure out why all of my PXE servers will not show up in PXE Manager. The problem seems to be with two of the four total PXE servers that we have. The scenario:

We have PXE Server "A" and PXE Server "B"

If PXE server "A" shows up in the PXE Manager and I restart PXE Server "B" then PXE Server "B" replaces PXE Server "A" in the PXE manager and vise versa.

At first I thought it might be an IP address issue but now I'm not sure. I have set both PXE server addresses and I know that they are different.

Does anyone have any ideas on this.

Possible fixes for DS 6.9 PXE automation issues.

I have not implemented these fixes yet but I found these Articles on the Symantec website. I plan on implementing them next week to see if it solves an issue I am having. When I push an image to a new computer some times the PXE will not stop to allow the selection of DOS and just Exit and load the OS.

Article ID: 41019, "Receiving error "unable to boot to DOS/Linus/WinPE automation" when running jobs that use PXE redirection

Article ID: 41043, How to work around the problems with PXE redirection in Deployment Solution 6.9 with DOS automation

PXE stopping intermittently

I've implemented article ID 41043 (there is one for Linux and WinPE as well), and it seems to work correctly. I'm also looking at the following KB article:

Article ID: 36468
PXE config utility constantly updates and PXE config Services stops automatically

it has to do with the system account missing from the eXpress share.

As like many others, I've done plenty of wondering why this happens...

Article ID: 41019

With regard to this article, here are some things I did to counter this, and so far they have worked:

-Ensure F and additional drives are mapped in DS Config, as well as in the menu option
-Use WinPE, autodetect drivers, use "Factory-WinPE"
-Enter server/ip in the lmhosts file
-This seems to be the main issue: Check "local" PXE servers for MenuOption164.tmp folders - sometimes they don't update all the way (don't know why yet). Look in this folder for files ending in .tmp and remove this extension. Rename the .tmp folder (example rename MenuOption164.tmp to MenuOption164

Problem is, if you update the Menu Options, the .tmp folder may restore, so check out the master options on the master PXE server.

We had a similar issue and

atkinc00's picture

We had a similar issue and it was due to the volume of systems we had (4000). How it was explained to me was there was some kind of bug in the axengine that would overun PXE killing it to the point it would not boot the MAC it was supposed to (forcing us to press F8 and manually selecting the boot environment). The fix for us was a custom axengine build provided by Altiris. Supposedly that fix is also now in 6.9 (we've yet to install though). Our issue would happen after just a few days and a restart of axengine and pxe would fix it. Not sure if it is what you are experiencing but may be worth inquiring about...

PXE errors

I have to restart PXE services every 3-4 days and more often if we image alot of systems. I can usually tell when I can't remote onto systems using the console then an image fails. I just reboot the system every Monday and Wednesday morning and it helps to keep us functional.We have PXE and DS on one system and a seperate DHCP server as well.