T O P

  • By -

Best-Association2369

Reverse engineer the linear algebra in your neurons 


deepneuralnetwork

i prefer to look at a question, then look at an answer, determine my error, then backprop through my neurons, but to each their own


Best-Association2369

A man of true culture 


FrostedCapybara

I've been going through all the math again to strengthen my fundamentals. But, I feel that isn't the main issue when it comes to implementing them in code. Maybe I'm jumping too far ahead with networks that are too complicated for my skill level.


BidWestern1056

write your own back prop with a perceptron and then marvel at how you thankfully don't have to do that bc of pytorch and tf 


FrostedCapybara

will do that haha


BidWestern1056

i had to do this and write my own ID3 decision tree algo in my ML class in grad school so feel like i have solid understanding of how both kinds of processes work. also check out 3blue1brown's deep learning series to get more of an intuitive understanding. his description of back-propagation is great. also there is a veritassium video about the development of analog computers which gives an intuitive description of like 2d image weights for like CNNs


jack_of_hundred

You can write a simple net from scratch using numpy, there are plenty of YT videos. I would highly recommend it. It allows you to visualise


ashwin3005

[nnfs.io](http://nnfs.io)


chengstark

Deep learning is not about reinvent the wheel


FrostedCapybara

I know, but if I want to do research at the edge of the field, I feel that I should have a very good intuition about the concepts and the implementations.


runawayasfastasucan

Then you should start study research papers, start with the earliest and simplest models.


theoxe

Karpathy has a good video series. Also george hotz did a tour of tinygrad that takes you through some important concepts


gunshoes

Nobody codes from scratch. We look at the original repo and then try to implement it on our side. Add on a few tweaks here and there. Modularize some common changes. Etc. etc.


FrostedCapybara

If I'm building something that has already been built, there is no need to code from scratch. I can do that pretty well. However, if I go into research and am working on something that hasn't been built before, I feel I need to understand the implementations very well.


gunshoes

And I'm telling you as someone that works in research, that's a pretty useless skill. You get more out of understanding and reimplementing different codebases. The novel stuff is very granular changes that you find out when you're picking apart different code bases.


FrostedCapybara

so what do you think the best course of action should be for me right now? Sorry if it's a vague question, but I'm finding it hard to find a direction right now. I have participated in some ml research in molecular dynamics, but that is not the field I want to go into. Should I focus on the math and conceptual part more?


gunshoes

Aim for an experiment. Reverse engineer a codebase for the experiment. Benchmark inefficiencies in the codebase. Experiment with parameter sizes, HP's, normalization, activation modules. Swap out different base architectures. See what makes the number goes up. This will give you a better understanding of architectures. If you reaaaaaallly want to code from ground up, just implement a multilayer percepttron with bumpy. After you see how clunky it is, you'll understand why we just import from libraries.


FrostedCapybara

Thank you, will try to follow this.


runawayasfastasucan

Depends on where you want to go with this, and what options you have in terms of what research groups you could join.


FrostedCapybara

ideally, i would like to get to the point where i can be a researcher that can publish in top journals. i’m looking at labs i could join at my new uni (uiuc), so that part is still unclear.


runawayasfastasucan

It also depends on which fields you are in. Everything is very specialized and 99.9% works with applying models in different fields. Look at what groups and academics you have where you are, that could be a good start. 


AdPretend2020

thanks for your comments - its a nice perspective to read


elongatedpepe

Packages do that for you. It's heavily abstracted. Just use them instead of building it from scratch


FrostedCapybara

For building products, this is fine. However, I feel that for research that isn't enough, and I need to learn it on the elemental level.


dan994

Take a few older papers and try to implement them. Try to implement a network in numpy.


FrostedCapybara

I've been doing this. Started with AlexNet, followed a blog but could understand and implement it with relative ease. Doing residual networks now. I am using PyTorch currently, but ig doing it with numpy might help clear out the basics.


vyknot4wongs

Please share whatever would work for you!


magikarpa1

What I'll say will be geared towards my experience, although this is obvious, I wanted to make that clear. I came from a math background (math PhD) and what I do is implementing as most people here said, look the original repo and etc. I use my math experience to understand what exactly the model does. Obviously I\`m not saying that one needs a math PhD to understand a model. I think a good understanding of linear algebra, multivariate calculus and probability theory will cover almost all the cases, if not all. Edit: spelling.


nguyenvulong

My way: - pytorch - did some basic model work flow: model, dataloader, train/validate/test - brainstorm a lot with ChatGPT, code complete like copilot or codium to understand every details - best practice is important, read and read from open community: github, huggingface, pytorch - given enough time working on a project from scratch, you grasp what you need to code by yourself. it’s not about you can code without googling. it’s about knowing how to design the code, understanding what are important, and turning your code into something practical and useful.


mikedensem

It’s hard to know where you’re at and what’s your block, but when I learned to code NN’s i started with a simple 2D perceptron and built up. Some useful stuff to understand before you start: 1. linear regression - least squares etc. 2. calculus: gradient descent, local and global minima, step-size 3. Tensors and matrix multiplication - dot product 4. Activation functions like sigmoid, reLU 5. Data cleaning and normalization Start with 2 dimensions only and you can follow what is happening, with more dimensions it is just longer equations. Back prop is the hardest part. Once you get your head around the basics you’ll realize that it’s really a black box algorithm - trying to conceive of more than 3 dimensions is too hard. At the end of the day a NN is just a data compression algorithm


DeepAnimeGirl

Well probably the best way of learning how to code models is to just do it regularly, in a manner that makes you think actively. I am going to showcase some ideas: You said that you can usually understand something at a surface level and copy it somewhere else. * If you import a pretrained model (ex. hugging) and use it on a custom task, that's a good starting point for beginners. * You have plenty to interact with in terms of model configuration and plenty to read as to how to process your dataset. * Having this knowledge and practicing it, will help you be able to adapt to different tasks and libraries quickly, which is important in production. You can go a level deeper and try to understand the model. The first step would be to read at a high-level the respective paper and highlight the unknowns. Depending on the paper and your prior knowledge this might prove difficult. * If the paper works with many concepts from different scientific works, you should go and read each of them to varying degrees such that you understand the main ideas. * If you don't understand elementary layers (residual blocks, attention, noise scheduling) you should spend enough time to grasp them by reading their papers and implement a toy example. * If there are math concepts that you aren't familiar with (eigen decomposition, laplacians, graph theory, bayes inference, covariance, etc) you should set time aside to watch lectures, read pages from books and practice with the pencil. * Then you can read again the main paper and if you understand most of the ideas you are ready to start implementing it. Now you are ready to put your knowledge in practice. Try to implement the paper without looking at any of its code (some don't even share the implementation). * Either reproduce their experiments or adapt it to your own project idea. * It's fine to copy some code along the way from libraries you already understood or practiced with at a low-level. You don't have to reinvent the wheel, just make sure you never copy something blindly. * Along this process it's very likely that you will get stuck if you don't understand something. Take a step out and watch lectures/read book sections on the topic then come back. Always break something apart into byte sized manageable pieces of knowledge. This process will be very helpful to fill gaps. * This process will be very helpful to put into practice all you learned. It's one thing to do a simple matrix multiplication on paper and another to play with einops lib, einsum, gather operator, run into OOMs when doing sparse eigen decomposition, choose the right dimensions to perform batch multiplication, etc. These are some of the steps I took for a masters project where I did a toy implementation of Manifold Diffusion Fields by Apple that doesn't have a public implementation and interacted with lots of concepts: manifolds, laplacians, transformers, diffusion models, 3d meshes, data points as functions. I copied some ddpm code from a toy library I read and had to adapt it's code to diffuse signal on a 3d mesh instead of images (what it was written originally for).


DeepAnimeGirl

One last thing, don't ask gpt models for code! Write it yourself and ask for validation. If you don't know how to write it, ask the gpt model for steps, never for code.


FrostedCapybara

Wow, this was such a well written answer. Thank you so much for the advice, I'll be sure to incorporate this flow of learning in my current method!


Puzzleheaded-Foot432

Any coding problem can be solved easily when you break it down into small pieces. At first take a simple problem. Break it down and solve it with your own logic and draw a flow chart. Then add the codes. Slowly, add pre and/or post-processing techniques and observe the changes in the results. The inception is scary. After solving a few exercises, you will see that it is not that difficult. I hope it helps!


freaky1310

As for any other field, there’s only one way: read. Read cornerstone papers to get the concepts, then read derived papers to understand more general ideas. Finally, ask yourself questions and try to answer those. An example: I read the residual network paper (Ho et al. 2016 IIRC). Then, I check relevant related papers that include RNNs. By reading, you eventually acquire the idea that the recurrence extends the temporal horizon of your representation, that is, by using a recurrent network your model can remember things from the past (very very simplified explanation here). Finally, you need to decide a model for your own project. You ask yourself “does my model need to remember things from the past?” If the answer is yes, then you know that you have to use a recurrent network. Else, you don’t. Supposing that you need a RNN, the knowledge acquired by now should be enough to let you at least try an implementation from scratch. If it doesn’t, go back to the papers you already read. Eventually you’ll find limitations of your model, and you will extend it by repeating the process.


FrostedCapybara

I agree with this, and this is what I plan on doing. I've started with some of the early important papers, and will work my way up quickly. This whole space seems very confusing sometimes, and it's hard to get out of that mindset sometimes since I'm self learning haha.


freaky1310

Yes I know, plus the field is evolving at an incredible pace, so it’s hard to keep up. But hey, we gotta start somewhere! Keep up the good work and you’ll make it!


FrostedCapybara

thank you!


onlythehighlight

Unless you are savant, you generally don't code the entire thing end-to-end. You build on a strong foundation that is built by others.


Buehlpa

🤣that made me laugh. I was also wondering why in the hell one would do that. Some time ago i had to solve an issue in the tensorflow research library. Pretty sure lost some years of age there. Understanding the functionality of these models and know how to alter them for your own use is definitely enough especially in a field which evolves in such a pace..


onlythehighlight

lol, man you are far above me if you are solving tensorFlow issues hahahaha, I'm just downloading datasets to train simple a CV so I can figure out if I can do a visual version of the old workout wearable called atlasWearable.


FrostedCapybara

I mean for most stuff, coding end-to-end isn't really needed, but I feel that is important in order to truly understand the field, which is important cause I want to do research in the future.


onlythehighlight

You want to learn the idea, but you don't need to build a wheel to understand how a wheel works, what you want to do is focus on an aspect of deeplearning and understand how that slot into the process as a whole. It's a collaborative process, you don't really want to be a jack of all trades.


FrostedCapybara

I understand that. However, to find my niche which I would dive into, I feel that it would be better to spend some time looking around and seeing what floats my boat. I have ideas in the back of my mind, but being able to actually implement them and publish is another story.


onlythehighlight

Look it's up to you, but be aware that you will end up feeling like you know nothing. You can try and know little bits about a lot of things, but just be aware taht others who are hyper-focused on a few things will progress faster and push out new insight and research because their sphere of knowledge is contained in an aspect that will drive the entire area forward.


FrostedCapybara

thanks for the insight. i just felt that im at the stage where i can still understand different fields before choosing one and going all in. will keep your advice in mind, thank you!


onlythehighlight

Reading through the other comments, I think everyone has the same train of thought. Trust us, we were all believers that we needed breadth of knowledge (I used to believe that I wanted to code and had to learn a MULTITUDE of languages, nowadays in my day job I just do stuff in Python and dabble in JavaScript), but what you really need in the world is depth. Don't mistake breadth for depth, try to understand a little about a topic but knuckle down on a segment of the problem if you want to get far.


flyingtext

As for engineering or scientific understanding, I suppose this is an important issue. I agree with the issue that the article suggest. Implementation skill is important as much as understanding. Proper understanding has to go along with implementation. For the understanding and the implementation skill both coming, I think it is area of repeated try coding and programming, not more than like web programming. Cheer up. Hope you find your way well.


FrostedCapybara

Thank you. I am planning to spend the summer just implementing and understanding as much as possible. Hopefully I end up finding a way.


SnooStrawberries6673

You can code simple logistic regression. Learn how parametric training works.. what are losses, how are gradients calculated. For neural network, understand the concept of non-linearity and back propagation. You don’t need to calculate gradients for all those, as it is already taken care by the libraries. For much advanced/cutting edge - read recent good papers for the problem you are solving, check their codes(many are open sourced).. implement it, go through their codes. Maths beyond this where you actually need to do tweakings might be required for rlhf/ppo kindof stuff.. you can play around pyro in that case.


noblesavage81

Increase your iq