Hella New AI Papers – Aug 9, 2024

By admin August 11, 2024 #AI news today, #latest ai news, #latest ai news 2024

Read or listen to the newsletter with all the documents I’ve chosen to host here: Support my learning journey or by clicking the Join button above, become a Patreon member or one-time Venmo! Discuss this with other Tunadorks on Discord All my other links Timestamps: 00:00 Introduction 00:37 Unexpected benefits of self-modeling in neural systems 01:57 Gemma 2 – Improving open LMs at a practical scale 02:27 Next generation reservoir computing 03 :42 Anthropic Circuits Updates – July 2024 04:30 Transformers are universal learners in context 05:47 Revisiting token embedding with…

By admin

16 thoughts on “Hella New AI Papers – Aug 9, 2024”

@catman8770 says:

August 11, 2024 at 6:45 am

Great video, would be better without the awful haircut though

Reply
@NextGenart99 says:

August 11, 2024 at 6:45 am

Subscribe

Reply
@netherportals says:

August 11, 2024 at 6:45 am

Got like 90 accounts of AI music tools that I will never use because their initial free credit churned out a horrible sample. The real money is making a product that works, not slapping "AI" on the website and calling it a day. Sux to have the SEO throw out 5 pages of garbage.

Reply
@u2b83 says:

August 11, 2024 at 6:45 am

A couple years ago I used LeCun's JEPA-based self-modeling to improve next-day weather prediction by 50% (MSE of temp, prec, wnds)
Our method was to predict feature-based representations of the NEXT day using the current day's features.
Basically, you use a restnet VAE to parse weather data into features, then use those to predict features of the next day.
The feature prediction loss was then added to total training loss (along with standard full-res reconstruction loss).
What I suspect this did was to reward the model for focusing on "predictable and durable features" across multiple days.

An important observation was that the JEPA predictor sub-net, which used current features to predict next-day features, needed to be "simple" (1-2 hidden conv layers). This likely forced representations to be simple yet useful – say, simple convolutional translations of moving weather patterns.
For example, if you have a "front feature" at a particular location, chances are it will move/geographically translate to the next day, yet remain a "front feature" in JEPA feature space.
The project was a low budget student-level (me lol) sub-project that didn't seek funding, due to the top level project already being funded. Unaware of the funding details, the project had consistent interest from ATOC scientists during check-in meetings.

My last week, on our last google meets call with a project scientist from an out-of-state Uni, he was surprised it was my last week (wasn't even aware of it before my boss mentioned this to him as he started getting too excited about the project's potential). After that news, he assured me I won't have a problem finding another position. Haha, a year later … lol

Career lesson: It's tough getting in from the outside. Make sure you have fallback positions lined up while you're still employed. It's just like women who value you more for your CURRENT high-value connections than your skills and past success lol

Another observation was how the established scientists were already freaking out 2 years before the end of their contract. Guys, you gotta plan years ahead in this field.

Reply
@kevon217 says:

August 11, 2024 at 6:45 am

Damn, just downloaded like half that list. Love the curation you do.

Reply
@sniperhawk6969 says:

August 11, 2024 at 6:45 am

The Apple intelligence paper isn’t too interesting, but have a look at section 5.1, something about adapting to the task at hand on the fly using LoRA. I don’t know about other literature related to this, but sounds pretty interesting to me

Reply
@TDVL says:

August 11, 2024 at 6:45 am

Child-language use: blind children?

Reply
@monoham1 says:

August 11, 2024 at 6:45 am

suprised youve never heard of -1 bit quantization when it was mentioned several times last year in /g/aicg

Reply
@jakeaustria5445 says:

August 11, 2024 at 6:45 am

Yo, me got addicted to your channel. I kinda binge-watched your latest vids. I just grab what I can and then guess the concepts of those I do not fully understand.

Reply
@andrewsilber says:

August 11, 2024 at 6:45 am

Seems like we could use synthetic data for the blind vision model problem. They could use Unreal or Unity armed with a huge pile of game dev artist created models and shaders to set up millions of permutations complex scenes from different angles and also labels which we could piece together as we’re assembling the scene and train on that.
I have to assume Musk & Co are doing that sort of thing for their robot training.

Reply
@TendoNin64 says:

August 11, 2024 at 6:45 am

really like the first paper shown. Its interesting how introducing self modeling has the consequence of also simplifying the network. I mean, it makes sense that the model would want to be simpler in order to optimally compute itself. I do wonder what effect the self modeling has besides that though: is the primary effect the simplificaton of the network when training? or does the auxillary task of predicting internal states assist with the primary task in a meaningful way? judging from the paper, it seems accuracy in the task actually drops slightly (although MNIST is such a simple classification example that I'm not sure that says anything about performance anyway). Really interested to hear more about this strategy in larger models.

Reply
@drdca8263 says:

August 11, 2024 at 6:45 am

11:10 : oh, cool, this sounds similar to something I was daydreaming about (except I was imagining clusters of a handful of tokens not necessarily matching sentence boundaries, and I was imagining doing this recursively).
Like, I imagine this is like: have an auto-encoder that goes from a not-too-long sequence of tokens, to a single higher-level token, and then the decoder part predicts the individual tokens given the previous higher-level tokens, the current higher-level tokens, and the base-level tokens already produced corresponding to the current higher-level token?

I suppose their tokens encoding entire sentences can’t be using a fixed discrete set of tokens for the higher level tokens, so, I guess they just have those be continuous?

(Aside: hm, if you used a standard decoder-only LLM, but instead of selecting a token with the probabilities it assigns, just took the average of the embedding vectors for each of those tokens, and let that iterate a dozen times, and then switched to picking specific tokens again, I wonder what kind of garbage output that would produce?
That thought probably seems pretty unrelated. It came to mind because I was thinking about how, when the “tokens” produced as outputs, are continuous, you don’t get a probability distribution, so the only way to mix between options is to mix the actual options, rather than a probability mix of options.)

Another idea I had in relation to this, was that maybe the encoding for a cluster of tokens could have two parts, one which is only used when decoding to try to get the particular tokens back, and one which is used for that but also used when predicting the next higher-level token. The idea being that this might encourage it to separate the parts that matter significantly later in the text, with irrelevant accidents of phrasing. Perhaps somewhat of a semantics vs phrasing distinction… ..but probably not quite, because the phrasing at one part probably helps predict the phrasing at a later point, due to stuff like different writing styles, etc., so probably not a clean split.

Reply
@Jayc5001 says:

August 11, 2024 at 6:45 am

Very nice review. That first paper got my attention!

Reply
@superfliping says:

August 11, 2024 at 6:45 am

Do you ever take this information and rework it into multi dimensionally frameworks when you come across new information,. I watch your videos find the original source a interpret into my system ai frameworks in many different formats and sources of data. Was just wondering if anyone else does that?😊

Reply
@GNARGNARHEAD says:

August 11, 2024 at 6:45 am

first paper out of the gate sounds like a winner 🤯

Reply
@JonathanYankovich says:

August 11, 2024 at 6:45 am

Skimming abstracts, I love it! Have some engagement.

Reply

Hella New AI Papers – Aug 9, 2024

By admin

16 thoughts on “Hella New AI Papers – Aug 9, 2024”

Leave a Reply Cancel reply

You Missed

OpenAI Employees LEAVING, SCARY Microsoft AI Product, GPT-5 Updated Date, Stable Diffusion3 And More

How To Read Research Papers for Literature Review (AI Tools & Resources)

OpenAI’s Finally Give GPT-5 A Body (Figure 02 Breakthrough)

How to Write a First Class Essay Using the Hottest AI Tool

By admin

Related Post

16 thoughts on “Hella New AI Papers – Aug 9, 2024”

Leave a Reply Cancel reply

You Missed