Reballing GPUs seems to be what all the cool kids are into these days. They make the claim that a dead GPU can be repaired by reballing it. What they aren’t telling you, or don’t realise themselves, is neither did it need reballed nor is it now repaired. Maybe it’s the easy money or maybe they just like the temporary smile on the customer’s face. Lets not beat about the bush though – it’s bollocks and don’t pay for it.
When a high performance chip, such as a GPU, is working hard it heats up. It heats up because it’s drawing high current and the energy dissipates as heat. When it’s off or idle it cools down. Over time this constant change in temperature causes its internal circuitry to start decaying. At high temperatures the tracks (or ‘bumps’) in a flip-chip design BGA chip physically deform and you end up with a dead chip.
If this chip is in your beloved laptop you probably don’t want to buy a new laptop; instead you want it repaired. You take it to your local repair shop and they tell you ‘sure, no problem, it’ll just be the solder balls, we’ll reball it for you no problem!’.
Wait, what? What they just told you is not that the chip is dead and needs replaced, no no no, what they told you is that the lead-free solder balls which connect a BGA (such as a GPU) to the board are probably at fault here. These solder balls melt at around 220°C – if your GPU had actually gotten to that temperature it would have melted like the reactor at Chernobyl. In fact what would have happened is it would have gone into thermal protection mode way before then. They could also claim the solder balls have cracked, a.k.a dry solder joint. This is highly unlikely unless there has been physical damage, such as dropping your laptop from a great height. Make no mistake these solder balls are tough little beasties and very very rarely fail.
You trust the repair shop though and because you are none the wiser you go along with it. They take your board, apply massive amounts of heat to the chip to liquify the solder balls, take the chip off, apply new leaded solder balls and reattach it to the board. Next the big test – it powers up and works… Wooohooo! It’s fixed they say… everyone’s happy. That’ll be £150 please, have a lovely evening.
What just happened? Well, here’s the fun part. The process required to reball has fixed the problem, not the new solder balls. The chances are there was nothing wrong with the solder balls in the first place.
When you heat the dead chip in order to melt the solder balls you are effectively causing the internal tracks to realign, or rather be nudged back into place so they just so happen to work again. If you were to apply 120°C to the chip i.e. 100 less than is required to melt the solder balls, for around 5 minutes you will achieve the same result. The chip may well now function but it is not repaired. It’s a in a state of fragile mess. Once the chip is put under heavy load for a period of time and its internal temperature increases it will fail again. For a moderate user, such as someone who browses the Internet and other non-graphics intensive workloads, it may well be the ‘repair’ lasts for a few months. However, if you are performing video editing (i.e. something which uses the GPU heavily for processing) or gaming etc this chip will fail again very quickly. The chip is dying.
Here is some testing performed by Amkor.
The below plot shows improvement in life for Copper Cu pillar over SnAg bump for the same current / temperature condition and similar bump / UBM geometry.
|Test||Test Conditions||Read Point||SS||Results|
|MSL3||30 / 60-192||260C 3X||77 x 12 lots||PASS|
|T/C B||-55°C / 125 °C||1000X||77 x 3 lots||PASS|
|HAST||130°C / 85% RH||96 hrs||77 x 3 lots||PASS|
|T&H||85°C / 85% RH||1000hrs||77 x 3 lots||PASS|
|HTS||150°C||1000hrs||77 x 3 lots||PASS|
Of course the Amkor testing is actually a sales pitch of their new Copper technology so shows higher-than-expected failure times.
The only way you will fix this laptop is by replacing the GPU chip with either a new chip (if available) or a known good working one from a donor board. This is where reballing is valid – when replacing the chip.
A side effect of reballing an existing chip is you have also unnecessarily heated it way beyond its operating temperature again, so if you repeat this process you are killing it even faster.
Heating a dead GPU for a temporary fix is therefore only valid if the customer understands entirely that this is NOT a repair, this is NOT a permanent fix and they should NOT be mad when it breaks in a few days time. Do NOT charge them a premium rate for your amazing reballing skills and do NOT make any false promises. In fact, just say NO you cannot repair it, they need a new machine.
I think you get the point.