Sunday, October 7, 2018

It's about Trust... and Pride

My computer is all fucked up again.  You know, for a guy that likes computers... for a a guy that used to work as a computer tech and got paid well for it... I'm starting to think that I'm just bad at this.

I think to really prep this, I need a recap of my computer lore.  This won't go into to much depth but as I have to read through a bunch of posts just to remind myself of my computer journey, I figure you loyal readers will have to have the same journey.  My first computer was back in junior high school.  I guess technically it wasn't mine and instead was the family's computer.  But it was mine.  A shiny brand new several thousand dollar Commodore 64 with a tv/monitor and all manner of accessories. After many years of wonderful service (and being replaced a couple times) it was replaced with an Amiga 500.  that was a beast of a computer.  It was also the bowing out point for my parents as they realized what it would take to 'keep up' with computers.


The Amiga 500 went through a couple upgrades, but while in college I took all my savings and bought an Amiga 4000.  My first desktop (as opposed to those funny keyboard styles) and my first hard drive.  It had 2MB of ram and 120MB hard drive.  It was awesome!  And then 3 months later it broke.  Unfortunately Commodore (the company that made the Amiga) went out of business in those intervening months.  So only 3 months after spending like $2500, I had nothing more than a door stop.  I think I returned to an Amiga 500 for awhile, but I obviously recognized that this wasn't a long term solution.  And as much as I loved Amigas over PCs... well, PCs won that battle.

My next computer was my Compaq tower.  It had a Pentium processor (166MHz!), 24MB of RAM, 2.5GB Hard Drive, and a 6X CD Rom.  Obviously today it's something of a joke but it was a fairly nice machine at the time.  I kept it upgraded and useable for several years from Windows 95 to Windows 98 second edition.

When I got a job as a computer tech I built my first computer and to one degree or another I've been working on that computer ever since.  I built that system in 1999 and I can't really say what components it started with or any major stops until recently, but there were minor upgrades of hard drives, memory and video cards, and then major upgrades of motherboard/processor/RAM.  It changed cases several times and sometimes even started over with a clean version of windows.  But from 1999 to now I've always pulled at least one piece of the past computer into any 'new' build.

Because of this blog, I can actually start to trace the computer starting around 2010.  No, this blog didn't exist back then, but I have the stats of the computer I was upgrading from in November 2014.  So my 2010-2014 rig was this:

~2010
Processor:        AMD Phenom II X4 955 Black Edition (3.2 Ghz)
Motherboard:   ASUS M4A87TD/USB3 AM3 AMD 870
RAM:               8GB DDR3 1600
Case:                Cooler Master (about 2 years old)
Hard Drives:    500GB Western Digital ATA 'Caviar Blue' 7200rpm
                         750GB Western Digital ATA 'Caviar Black'  7200rpm
Video Card:      EVGA GeForce 465 1GB
Power Supply:  Cooler Master Silent Pro M700 700w 80 Plus Bronze


If you go back to the posts I'm linking to you'll probably notice that I list more components like speakers, headset, mouse, keyboard, and monitor.  I'm not including those here as I don't consider them part of the computer.  To me the computer is the case and the components within it.  If I can 'upgrade' the part without opening the case, then it's just a peripheral.  That's not to say I don't like those peripherals or that they're not expensive... but they're parts that plug into the computer and not the computer itself.

So back to the 2010 build.  I can't say what was exactly upgraded between 2010 and 2014 as those are the stats that I ended up with in 2014.  But lets assume most of that was in the original build.  I still have those two hard drives in my system.  In 2010 one was a boot drive and they're both just storage drives now but I still have them and they're a connecting DNA.

My 2014 build was made because of a game.  Assassin's Creed Unity required a better video card, and the processor barely met the minimum requirements.  In times past I probably would have just upgraded the video card as everything else was going smoothly... but this was soon after I started working as a nurse and I wanted to splurge.  I hadn't had a top end system for a long time and I wanted to look at new games and such and not have to even blink at the required specs.  I detail the process in this post but what I ended up with is this:

2014
Processor:        Intel Core i5 4690K Devil's Canyon
Motherboard:   ASRock Z97 Extreme6 LGA 1150 Intel Z97
RAM:              16 GB DDR3 1600
Case:                Fractal Design Define R4 Black Pearl
Hard Drives:    Samsung 840 Pro Series 512GBSolid State Drive
                         500GB Western Digital ATA 'Caviar Blue' 7200rpm
                         750GB Western Digital ATA 'Caviar Black'  7200rpm
Video Card:      EVGA GeForce GTX 970 Superclocked 4GB
Power Supply:  SeaSonic 650W 80 Plus Gold

Not bad at ALL!  At the time I built that system it had been awhile since I had seriously looked at components and the one stupid upgrade was the processor.  I bought a processor made for overclocking and then never overclocked it.  It's like getting a convertible car and never putting the top down.

So this computer ran flawlessly for 3 years.  Well, not flawlessly but it ran fine.  It had occasional freezes and burps, but it always recovered just fine.  And honestly with Windows upgrades, software upgrades, and all these various components I'm not surprised or even worried at these type of problems.  Until they become problematic.  And in 2017 they became problematic.  I guess it's fitting that it was a game that initially spurred on the issue.  This time it was Tom Clancy's Ghost: Wildlands.  It was freezing just about any time I played that game and I was at the point of formating the hard drive and reinstalling windows 10 when I saw that the system was failing over and over again.

I could have gone through and started to nitpick out the pieces.  Freezes like that could be the power supply and that's not to hard to swap out.  It could be the RAM and that's not too hard to test and eventually swap out.  But it could also be the processor and/or motherboard... and that means pulling the bones of the computer out and more or less starting over.  The prices of the processor actually had gone UP since I built it and was now hard to find.  To get a newer/better processor (that would also be cheaper) I'd need to change the motherboard.  And if I change the processor and motherboard, I should get new ram to make sure it's running efficiently.  I kind of had the same thought from 3 years earlier... I have the money to just build a new system and I like building computers, so why not just start over with a new build?  While this system ran perfectly smooth and could play any game out there, it was 3 years old and not too far out from being outdated.

So... yeah, new build.  I talk about the build in this post but what I ended up with is this:

2017
Processor:       Intel Core i5 7600K
Motherboard:  Asus ROG Strix Z270G Micro ATX
RAM:              16 GB DDR4 2400 HyperX Predator
Case:                Fractal Design Define C
Hard Drives:    Samsung 840 Pro Series 512GB SSD
                         500GB Western Digital ATA 'Caviar Blue' 7200rpm
                         750GB Western Digital ATA 'Caviar Black'  7200rpm
Video Card:      EVGA GeForce GTX 1070 Superclocked 2 8GB GDDR5
Power Supply:  SeaSonic 650W 80 Plus Gold

New processor, new motherboard, new RAM, new Video Card.  I could have used the same case without problem but I decided that I wanted to play with a smaller system.  The normal 'mid-tower' size was something I hadn't taken advantage of for some time so I moved down to a Micro ATX case.  I would have gone to a Mini ITX, but I was going to also play with overclocking this time and wanted to do some water cooling.  As it's my first serious go at both overclocking and watercooling
I wanted at least 'some' space for it.

Every system I've built has had the required fans.  I think the 2010 build had 1 fan in it, the 2014 build had 3 fans in it, and this 2017 has 2 fans as well as an AIO CPU Watercooler with it's own radiator and fan.

This system is blazing fast.  With the motherboard designed from the ground up for overclocking and this particular processor I was able to get a 23% overclock without breaking a sweat.  I could probably get more, but I was happy taking it from a 3.8 GHz to 4.7 GHz. I have absolutely no reason to get rid of this system.  When I play most games I can peg the settings to the highest point and still get 60 frames per second... and understand my monitor can only go to 60 or 75 frames per second (it's an awesome widescreen monitor but wasn't designed for gaming).  I don't twitch games so getting 100 or more frames per second isn't something I'm interested in.  Hell, I could have stayed with the GTX 970 video card level but I wanted to future proof the system a bit.  I have no reason to think this will need a major upgrade for another 2 or even 3 years.

But then the problems started.  It was maybe spring when I first noticed.  It was a HARD freeze.  Most of the time when my computers have froze it was just the screen locking up, the sound repeating, and then it goes either into a blue screen or a reboot.  This time the screen froze, but all the sound cut out as well.  It stayed that way for a bit (I don't recall how long it was in the beginning), and then rebooted.

Another burp and nothing to worry about... right?  Well it occurred again a couple weeks later.  And then again a few weeks later.  And then again.  It became regular enough that I did half hearted attempts at trouble shooting it, but it wasn't stopping me from enjoying the system so I didn't want to really put my concentration into it.  But this summer, it got worse.  It was happening almost weekly.  Then a couple times a week.  Then a few times a week. Then daily.  Then a couple times a day.

Each time I'd promise myself that I'd get around to getting it fixed.  But I just didn't have the time I'd want to invest into trouble shooting it.  Seriously... the first thing I'd want to do is crawl through the windows error logs.  That would take a couple hours.  This felt like a RAM issue, so if I didn't find anything in Windows that sent me off in a different direction I'd start checking the RAM.  As the crashes were happening so much I could simply pull one stick of RAM and see if that fixed it.  If it didn't, pull the other stick and put the first one back in.  If that didn't, try both sticks one at a time in different RAM slots.

If that didn't fix it, my next step would be to either check the processor or more likely format and reinstall windows as that many crashes may have fucked up Windows enough that it's crashing on it's own.

But while I was making plans for this.... it finally crashed and stayed crashed.  One evening a couple weeks ago now it did it's normal crash but then followed it up with another crash a few minutes later. They, to that point, hadn't been that close together before.  And then it did it again less than a minute after the boot.  I stopped the next boot and got Windows up in safe mode so that I could check the error logs... but it crashed while in safe mode.  And then did it again.

I wasn't even having enough time to get my hardware monitoring software up to see if the processor was flipping out or any of the temperatures were out of wack, so I went instead into the BIOS.  This motherboard is thankfully designed for overclocking (and yes, I was starting to worry that maybe the overclock was to blame!) and it has some hardware monitoring built into it.  So I got there and started watching the fan speeds and temperatures... and they all looked fine.  The processor was just a tad warmer than what I'd want but it was nowhere near hot enough to even spin up the fans let alone cause a crash.

And then it crashed in the bios.  This time it didn't reboot, instead it just froze the screen.  The screen it froze on was showing the temps and the fan speeds and they were all well below problematic areas.

FUCK

I at least answered one question.... it's not Windows.  Or at least it's not JUST Windows.  It's not overheating... probably.  It could have spiked so fast that the screen didn't update and the system 'shuts down' to protect itself.  At that point I had been halfheartedly trouble shooting it for about an hour and was nowhere, so I turned it off and went to bed early.

Between then and now I've done some monitoring and testing but I've lost all trust in my computer having the ability to be up and not crash.  It can be fine one day and then nothing but a crash fest the next.  There doesn't seem to be any indication that it's about to crash either... it can be while I'm playing an intensive game, it can be while I'm working in Photoshop, it can be while I"m playing full screen videos.  But it can just as easily be while I"m browsing the web or working in MS Word.  It's crashed once while writing this post leaving me with two or three paragraphs to re-write.  I did open the case to see if there was anything overt a few days ago, but there's no scorch marks and the system is just about as clean as new (the case has a lot of filters to keep dust out).

The one problem I've found is that the AIO CPU watercooler seems to be flaking out.  If I let the system sit without me interfacing with it (either typing or moving the mouse), the temps on the CPU move up fairly quickly and the case fans start to work hard.  But all it takes to fix it is for me to tap the mouse and it all comes back down to normal.

A quick word about temps as it's something I've talked about and worried about a lot.  For the CPU and GPU (the video card) the temps should be between 30 and 50 degrees Celsius when at idle.  Idle means sitting back and not doing much of anything.  It can be web browsing, playing music, and even light Photoshop work.  Putting it under a moderate load (full screen videos, harder Photoshop work, light gaming) should send the temps up between 50 and 70 Celsius.  And under hard load (a major Photoshop task or for the GPU a full screen graphics intensive game) it can go as high as 90 degrees.  As the temps go up, so does the cooling power.  When idle, my fans are barely moving.  When it goes above 55 the fans spin up to about 1/4 to 1/2 speed.  I can't hear this level as my computer case sits to my right at my feet.  When the temps start going above 70 the fans spin up to almost full and it becomes noisy.  At when it hits 90, everything goes to full blown cold mode and does it's very best to get it below 85.  If the temps don't stead up or go down and instead continue to rise, the system will finally start to clock the CPU or GPU back... basically making it less effective but saving itself from overheating.

So when I say the AIO CPU Watercooler is flaking out, I mean that when doing nothing that should push the processor it's temps go to 90 and it starts to clock itself back.  Since I've been monitoring it closely, I've also seen that the CPU temps are almost always hovering between 45 and 55... not the chilly 30-40 I'd expect from a watercooler.  But even with that problem.... I don't think that's the cause of these crashes.  For one, I haven't seen the system go above 90 degrees.  I have a hardware monitor that I now run constantly and it not only shows the current temp (updated twice a second), it has the minimum and maximum temps.  Looking at it right now shows that one core of the processor topped out at 90, while all the others are between 79 and 80.  And when it hit that 90, I was aware of it without even looking at the monitor as all the fans had already spun up and it sounded loud.  That loud sound has never preceded a crash.  Since the monitoring software seems to be working fine and it would take at least several seconds of very high temps to cause a crash.... I have to assume I'd have heard those fans spinning up.

So... I have two problems.  One is the crashing and one is the AIO CPU watercooler.

If I fully take out the operating system as a culprit (I'm about 90% sure it has nothing to do with windows), I'm left with pure hardware.  The motherboard, the CPU, the RAM, the Video Card, the Hard Drives, and the Power Supply.  Lemme look at all of these, how they might cause the problem and how to properly trouble shoot it.

RAM
The RAM is probably the most devious but the easiest to troubleshoot.  It's devious because it can be a single 'chip' inside the RAM module or even a single sector on that chip.  So unless an important file hits it, the system will stay up and be stable.  And even if an unimportant file hits it, it will just error correct, take a bit longer than I expect, and move on.  Its only windows files that will really cause it to crash.  So that explains the random nature of what I've been experiencing.

It can also explain why the problem sped up and is now happening often.  Whatever happened to originally break the RAM has taken out 'more' of it and it's now just hitting bad sectors or chips.

Testing is just as I put above.  Pull one stick of RAM and see if that fixed it.  If it didn't, pull the other stick and put the first one back in.  If that didn't, try both sticks one at a time in different RAM slots.  If it crashes in all of those scenarios, then it's not the RAM.  Or at least not JUST the RAM.

CPU
The CPU is incredibly problematic problem wise and incredibly problematic to trouble shoot.  The CPU is the brain of the computer so it's the most likely culprit in one way.  The only problem is that when the CPU goes bad it's normally a very short trip from 'problems' to 'not booting'.  This has been months now.  The heating up quickly with the AIO watercooler could have caused a problem... but again that rarely stays problematic and almost always delves into dead.  Troubleshooting?  Buy a new CPU and plop it in.  There really isn't another way unless I had a spare computer with a compatible motherboard to pull this one out and set it in.  I don't.

Video Card
The very nature of this problem makes the video card unlikely.  But it's still possible.  It's obviously generating all the pictures on the screen, has it's own RAM and processing unit (GPU).  But about the only way it could cause a problem like this is to have a power spike and burn out part of the motherboard or other components.  In other words, it could have caused something else to break and now I'm seeing the results of that second part failing.

I come to that conclusion because there's no common video problem before the crashes.  And if the card itself failed the system would hum along in the background while I'd be seeing a black screen.

Trouble shooting it is even easier as I have onboard graphics.  I an simply go into the BIOS, turn on the built in video out and plug my monitor into the motherboard instead of the video card.  I could also pull the video card and try it in another system or pull it out and drop a new/different video card into my system.  Easy Peasy.  But again, I don't think it's the video card so I don't see value in troubleshooting it first.

Power Supply
This one is insidious.  The power-supply is all that stands between all that electricity in the wall and my delicate components.  A power spike from the wall can easily damage the power supply's ability to send out smooth safe electricity to the motherboard, CUP, Video Card... everything.  The power supply itself could cause these reboots by simply dropping below the acceptable power level and that would make the CPU shut down which in turn would freeze the computer and force it to reboot.  But it could also cause these problems by sending out too much power... it could make it's own power spike that would burn out other components.

I call it insidious because I've fought against a bad power-supply before.  Let's say it spikes the RAM.  I trouble shoot and find out that both RAM modules are bad.  I get happy as I've found the problem and can buy new modules to replace them.  But then a little later another problem comes up.  This time it spiked the CPU.  I go through the steps with the RAM and find out that they're not causing the problem.  I move on to the CPU but while I'm testing a different CPU in the system, it spikes the RAM again.  As the problems continue I come to the conclusion that the CPU wasn't the problem and send the new CPU back and put the 'bad' CPU back into the system.

See the problem?  A bad Power Supply can kill and kill and kill even while troubleshooting.  And even if it doesn't get into that frustrating cycle, it can still make me replace just about everything thinking that it's just one bad part after another.  I chased a power supply problem for 18 months before!

Trouble shooting is also problematic.  To do it right, you remove the power-supply from the system, put it onto a bench, pull it apart, and start testing the actual connections with a multi-meter and oscilloscope.  I don't have those tools though and even if I did I don't have the ability to read them.  So my trouble shooting method would be to pull it and replace it (I wouldn't trust putting it into a different system!).  But if the problem persists, I'd have to continue to trouble shoot with the new power-supply as it might have torched a component or two.

Now there's one big major reason I don't believe I have a bad power supply.   I currently have a SeaSonic 650W 80 Plus Gold, and SeaSonic makes hands down the very best power supplies.  It has a 10 year warranty, and they offer those because these babies just don't break without going into a full on failure.

Hard Drives
Technically in my system if this problem were coming from a hard drive it would be my Samsung 840 Pro SSD.  If it were flaking out it would easily kill windows.  Its possible that it's in a cascade style failure and it's own repair ability can't keep up... but again this would look and feel like a windows failure. The fact that the BIOS crashed and it wouldn't even touch the Drives leads me to believe it's not one of those.

Motherboard
The mother of all problems.  Like the power supply, when a motherboard goes bad it can break other parts.  Far more common though is a temperature/time issue.  One of the many MANY connections on the board screws up and when it's hot enough for long enough it fails.  It would only take seconds for it to cool down enough to make the connection again, so the crash and reboot cycle would 'fix' the problem temporarily.

Now when I say temps, I'm not talking about something I can easily monitor.  This wouldn't be the CPU temp or the GPU temp or even the Chassis temp.  It would be the temperature at a very particular point.  Maybe right near a fan or just outside the reach of a fan.  A small area that exceeds 150 degrees Celsius and melts the contact.

Trouble shooting?  Yeah right.  There is no easy way to trouble shoot a bad motherboard because like the power supply it can break other components.  If I suspect it's a bad motherboard I'd just replace it and then test each component individually to make sure they weren't broken.  And to be honest, I'd never replace just a motherboard or a power supply.  Since they create such similar problems I couldn't decide if it's one or the other so I'd replace both.



As if all of that wasn't enough of a headache... almost all of these problems can cause ongoing issues with windows.  So to properly troubleshoot them I'd want a clean install of windows.  So trouble shooting this particular problem would look something like this:

  • Back up my data (as much of it to the cloud as I can, the rest to a hard drive that I'd pull out of the system).
  • Format the SSD 
  • Reinstall windows (watching for failures during the install)
  • Start using windows normally and wait for a crash
  • Pull a RAM module and test again
  • Swap the RAM modules and test again
  • Move the RAM module to a different, previously unused, slot and test again
  • Pull the video card and use the on-board video.  
  • Pull and replace the CPU (and while I was at it, the AIO watercooler)

Each step would be frustratingly slow.  Simply reinstalling windows could take out a lot of the crashes, but that doesn't mean it's fixed.  I'd have to simply keep using it until it failed again and that testing step could last a week!  Doing that with the RAM means moving down to half the RAM I'm used to.  Doing the Video Card means going to a minimal resolution and not playing any games.  If I got all the way to the processor I'd be looking at months of testing.  

Now that's how I'd trouble shoot now while trying to still use the computer.  If I wanted to do it right I'd have to commit to NOT using it while troubleshooting each step.  That way I could put some software on that stresses the system and more or less makes it crash quickly.  I could do the RAM and Video card in a single day.  But I don't remember the last time I've had a single day to myself without wanting to use the computer.  So realistically I couldn't do this in a few days, so it might still take weeks or months.  


The more I've thought about it this week, the more I've come to the conclusion that I should just trash the parts I'm suspicious about and start with another build.  The 2018 Build.  I'd have to replace the motherboard, processor, RAM, and power supply.  If I got a SINGLE crash after replacing those, I'd then have to trash the video card and hard drives too.  But even just the motherboard/procesor/RAM is expensive.  And I'd still have to take most of a day once he parts get here to take apart my system, put the new parts in, format the drive, and install windows clean.  

That thought has lead me to a conclusion that I'm not happy with, but might be the best way going forward.  

Stop Making Computers

When I build the computers I get some joy out of it and a lot of pride... but I'm also the tech support.  I don't have someone to call.  I don't have someone to just send the system to to get fixed.  I'm left with this very possibility... a computer with an annoying disabling problem and a bunch of expensive parts that are all out of warranty.  

Now don't get me wrong, I'm not thinking about going down to Best Buy or going online to Dell and getting any kind of 'average' computer.  There are plenty of enthusiast companies that make computers like I would.  They use the types of parts I would and even use the brands that I would.  I'd pay a very pretty penny for them to do this, but I'd end up with a computer as good as I'd build that comes with warranty support.  If I order it today I can have a new computer by next weekend.  

I've only seriously looked at Origin PC, but they're pricing is indicitive of the other companies I'd like and the one I built last night with upgrades across the board would cost about $3600.  That's with nvidia's newest Video Card (the RTX 2080, the current day version of my processor (Intel i5 8600k), an M.2 NVME 500gig SSD (Samsung EVO 970), a three year warranty, lifetime support, and an upgrade path for 3 years where I can trade in parts and get newer version for vastly reduced prices.  

So... the more I think about it, the more I think I should just bite the bullet and get out of the computer tech business.  Let the pros do it and pay them to do it for me.  That stinging sensation I'm feeling?  That's just my pride.  And as Marcellus Wallace once famously said... 

Fuck Pride. 
Pride Only Hurts.  
It Never Helps.  
You Fight Through That Shit.  
'Cause A Year From Now... 
When You Kickin' It And Someone Else Is Fixin' Your Computer...
You're Gonna Say To Yourself "Marcellus Wallace Was Right"

https://www.youtube.com/watch?v=ruhFmBrl4GM

No comments:

Post a Comment