Learn about neural networks and large language models with Brilliant! The first 30 days are free and 20% off the annual premium subscription when you use our link ➜ There’s been a lot of talk about artificial intelligence lately, but what I find most interesting about AI is that no one ever talks about it. It’s that we have no idea why they work as well as they do. I find this a very interesting problem because I think if we understand it, it will also tell us something about how the human brain works. Let’s take a look. 🤓 Check out my new quiz app ➜ 💌 Support me on Donorbox ➜ 📝 Transcripts and written news on Substack ➜ 👉 Transcripts with referral links on Patreon ➜ 📩 Free weekly science newsletter ➜ 👂 Audio-only podcast ➜ …

By admin

35 thoughts on “AI’s Dirty Little Secret”
  1. Reality, the AI revolution happened because there was a single person outperforming whole teams on an AI training website. The reason why is exactly why the double descent happens.

    Don't walk off with other people's work, that's why.

  2. I once asked AI to write a simple Python script for a project and it did it perfectly. Then, I asked AI to
    put 'this script in a box' (i meant copy/paste box) so i could paste it into deCoda….and it did it literally:

    ——————–
    | This Script |
    ——————–

    I laughed for days

  3. You can get ai to explain its reasoning. Its rather easy to get it display the vectors used and algorithms applied, understanding it though is a whole other thing.

  4. Sadly, I think the large language models actually ARE overfitting. Because the part of our brain that uses language works like an overfitted model. There is a very specific pool of words that we can use after each word, which most of the time depends on the context but choosing a wrong word spoils the flow immediately. So we simply want the LLMs to overfit in a useful way, just like humans speak.

  5. I can try to guess why this might happen. When we train a simple neural network, it generalizes from the data we have to predict data we don't have. For example, if we train the titanic dataset we are trying to generalize patterns that caused a person to survive. But we cannot consider all possible cases. If we had an infinite titanic with all the people on earth, we could find the full number of patterns. And our function (neural network) would repeat all the rises and falls of our data on the graph. So all our data would be perfectly represented by our model. But the Titanic is relatively small compared to all of humanity. On the other hand, LLM corresponds to the entirety of human language and the human world described by its language. Because I suppose that in five thousand books one can find the entirety of the language. So because LLMs better fit the fullness of the data, they better approximate the whole real situation. And they don't need to generalize it. This is why overfitting doesn't count anymore. So I guess it's an effect of the scale of the available data.

  6. The important thing to remember when developing AI is how to turn off its electricity supply. The difficulty will come from being able to retain control of that supply.

  7. After 01:30 – Douglas Adams wrote a series of novels, starting in the mid-1970's, which were based upon a computer being asked the wrong question. The consequences included the destruction of the Earth. How prescient of him!

  8. The biggest problem of all with current AI is that people actually expect it to be intelligent, when it definitely is not.
    Current "AI" is just a very complicated pattern-finder and matcher. It's a complicated word and phrase shuffler. It's an instrument which attempts to find a pattern which matches your request. The only difference between AI art, AI stories or AI driven chatting is in how the output is represented. The goal of the AI is the same in any case: Find something which matches your request.
    Where AI falls down is when it doesn't know what matches. The trouble is that it doesn't have any concept of "I don't know" and so even if it can't fulfil your request, it will still come up with something which, at first glance, appears to do so. Once you examine its output critically, you discover the problems which, at best, show that it was the product of an AI rather than from a human mind and, at worst, make the output useless for your stated purpose.
    AI can be useful, but only if you keep in mind that it can't actually think, that it doesn't actually "know" anything, and that it will provide an output even if that output is nonsense because it doesn't have the information it needs in order to satisfy your requirements.
    Current AI will never tell you "I'm sorry, Dave, but I'm afraid I can't do that." Who knew that that could be a bad thing? 😏

  9. There is a couple of minor inaccuracies in this video:
    3:26 While talking about inference, the video shows backpropagation during training.
    4:01 horizontal and vertical axes are swapped in the verbal description of the graph.

  10. Come on, this is embarassing, this double descent does not exist. Period. It's just that the metric for complexity is wrong, you can't just count the number of weights – there's also the architecture that matters.. as simple as that.
    Please think before posting such nonesense. Thank you.

  11. That they can't tell where ai gets its answer from is a lie. The reason they lie is because admitting where the answers came from would cause an explosion in copyright lawsuits.

  12. Neural nets are classifiers and not necessarily predictors. This classification can then be interpreted as "prediction" through an output node function, but the neural network is still a classifier, and therefore over fitting and under fitting are not a mystery. A neural network should be trained so that it is neither over fit or under fit, that is, it is able to generalize and determine the correct outputs based on untrained inputs.

  13. Huh? I thought it was obvious. I didn't know anyone was asking this question because it makes perfect sense to me. Maybe I've been wrong all this time, but I always expect over fitting to be overcome with more stable rules.

    Say I have a have a machine trained to look at the color of apples. If the first training data is just two apples the machine only needs one bit of information to overfit and preform perfectly. First one and then the other. That's much easier for the machine than trying to use a sensor to detect the color. If I then add a few million apples of data, the machine has a new easy choice. It takes much more effort to remember the order of millions of things than just to use the sensor each time and give a good guess.

    Basically overfitting is expensive for larger datasets.

  14. I did my Masters in the mid 90s about Neural Networks. I saw what's described here as over-fitting. To me it was mostly because large networks were trained with lots of data. The thing is each training round results in an error that is later reintroduced into said network for the next round. And ideally each round would result in a smaller error each time.
    The network I trained was used to cover gaps in instrument signals, with no other input that previous data before the gap. The longer the input before the better, except that in some cases things weren't predictable at all.

  15. Here's an example of her observation. I'm an investor so, years ago, I said since markets move in cycles I tried using Fourier Analysis on historical stock data to predict future moves. It was a complete failure since the more points I used the more wild/extreme the next step became. Newton's first law is all we have. Decisions are not well made with huge data and consensus…they are made with insight and commitment.

  16. AI seems to be limited to understanding theoretical concepts like math, physics, chemistry but not be able to apply to practically to the real world because it's never had a second of real world experience in its life

  17. Not sure what is not understood, here is how I see it. If the patterns to be learned require a complexity not projectable in the model size, then it will limit what it does learn to a subpattern, the underfitting. If you increase a bit the model size, but not enough, then it looks like the model sacrifices its knowledge for the sake of attempting the view of a greater pattern, but failing, which we call overfitting. Obviously the model has no ambition, but the reason is probably related to the existence of several local minima of errors respective to different subpatterns to be learned, and which the model does not give up on while still not big enough to either connect them together, or to dive more on each subpattern while correctly deciding which one to use. If that makes sense. An example for me would be a Starcraft player: he knows different simple strategies, and when using consistently one in particular and keeping that one trained, it performs decently. He knows however that in different situations, different strategies can be better. So he decides to train three different strategies. But oops, he's not smart enough or has not enough memory to maintain deeply the three of these, or can maintain the three of these but then lacks of resources to decide which to use… and hence, it was better to stick to a simple consistent strategy… unless he becomes smarter. However, he's not smart enough to realize it's own stupidity, so instead of rolling back to the one pattern, he still attempts, unsuccesfully to dive in each of the strategies, much as the models are stuck trying to dive in their optimization of several local error minima. Obviously we don't implement that "rollback intelligence" into models because we want to know their limits.

  18. I dont imagine reasoning has anything to do with how it uses intuition to arrive to conclusions. Most would fear that because they can't reach conclusions instinctively, and must use rationalizing and/or reasoning to reach them.

Leave a Reply

Your email address will not be published. Required fields are marked *