Learn Machine Learning

484 readers

1 users here now

Welcome! This is a place for people to learn more about machine learning techniques, discuss applications and ask questions.

Example questions:

"Should I use a deep neural network for my audio classification task?"
"I'm working with a small dataset, what can I do to make my model generalize well?"
"Is there a library available that implements function X in language Y?"
"I want to learn more about the math behind machine learning technique A, where should I start?"

Please do:

Be kind to new people
Post guides and tutorials that you find helpful
Link to open/free sources instead of paywalled when possible

Please don't:

Post news articles / memes (there are other machine learning/AI communities for this)

Other communities in this area:

Similar subreddits: r/MLquestions, r/askmachinelearning, r/learnmachinelearning

founded 1 year ago

MODERATORS

ShadowAether@sh.itjust.works

zappy@lemmy.ca

K-Means Clustering Infographic (sh.itjust.works)

submitted 1 year ago by ShadowAether@sh.itjust.works to c/learnmachinelearning@sh.itjust.works

0 comments fedilink

My LLM CLI tool now supports self-hosted language models via plugins (simonwillison.net)

submitted 1 year ago by manitcor@lemmy.intai.tech to c/learnmachinelearning@sh.itjust.works

0 comments fedilink

Classification Model Evaluation Metrics (www.researchgate.net)

submitted 1 year ago by ShadowAether@sh.itjust.works to c/learnmachinelearning@sh.itjust.works

0 comments fedilink

Introduction to Domain Adaptation for Neural Networks (machinelearning.apple.com)

submitted 1 year ago by ShadowAether@sh.itjust.works to c/learnmachinelearning@sh.itjust.works

0 comments fedilink

The standardization fallacy: the importance of variance (www.nature.com)

submitted 1 year ago by ShadowAether@sh.itjust.works to c/learnmachinelearning@sh.itjust.works

0 comments fedilink

OpenChat_8192 - The first model to beat 100% of ChatGPT-3.5 (lemmy.intai.tech)

submitted 1 year ago by manitcor@lemmy.intai.tech to c/learnmachinelearning@sh.itjust.works

6 comments fedilink

cross-posted from: https://lemmy.intai.tech/post/40699

Models

opnechat

openchat_8192

opencoderplus

Datasets

openchat_sharegpt4_dataset

Repos

openchat

Related Papers

LIMA Less is More For Alignment

ORCA

Credit:

Tweet

Archive:

@Yampeleg The first model to beat 100% of ChatGPT-3.5 Available on Huggingface

🔥 OpenChat_8192

🔥 105.7% of ChatGPT (Vicuna GPT-4 Benchmark)

Less than a month ago the world witnessed as ORCA [1] became the first model to ever outpace ChatGPT on Vicuna's benchmark.

Today, the race to replicate these results open-source comes to an end.

Minutes ago OpenChat scored 105.7% of ChatGPT.

But wait! There is more!

Not only OpenChat beated Vicuna's benchmark, it did so pulling off a LIMA [2] move!

Training was done using 6K GPT-4 conversations out of the ~90K ShareGPT conversations.

The model comes in three versions: the basic OpenChat model, OpenChat-8192 and OpenCoderPlus (Code generation: 102.5% ChatGPT)

This is a significant achievement considering that it's the first (released) open-source model to surpass the Vicuna benchmark. 🎉🎉

OpenChat: https://huggingface.co/openchat/openchat

OpenChat_8192: https://huggingface.co/openchat/openchat_8192 (best chat)

OpenCoderPlus: https://huggingface.co/openchat/opencoderplus (best coder)

Dataset: https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset

Code: https://github.com/imoneoi/openchat

Congratulations to the authors!!

[1] - Orca: The first model to cross 100% of ChatGPT: https://arxiv.org/pdf/2306.02707.pdf [2] - LIMA: Less Is More for Alignment - TL;DR: Using small number of VERY high quality samples (1000 in the paper) can be as powerful as much larger datasets: https://arxiv.org/pdf/2305.11206

GitHub - microsoft/Data-Science-For-Beginners: 10 Weeks, 20 Lessons, Data Science for All! (lemmy.intai.tech)

submitted 1 year ago by manitcor@lemmy.intai.tech to c/learnmachinelearning@sh.itjust.works

0 comments fedilink

cross-posted from: https://lemmy.intai.tech/post/24579

https://github.com/microsoft/Data-Science-For-Beginners

Emerging Architectures for LLM Applications | Andreessen Horowitz (a16z.com)

submitted 1 year ago by manitcor@lemmy.intai.tech to c/learnmachinelearning@sh.itjust.works

0 comments fedilink

Mathematical Foundations of Machine Learning (lemmy.intai.tech)

submitted 1 year ago by manitcor@lemmy.intai.tech to c/learnmachinelearning@sh.itjust.works

0 comments fedilink

cross-posted from: https://lemmy.intai.tech/post/21511

https://skim.math.msstate.edu/LectureNotes/Machine_Learning_Lecture.pdf

Neural Network Interactive Browser App: Tensorflow Playground (playground.tensorflow.org)

submitted 1 year ago by ShadowAether@sh.itjust.works to c/learnmachinelearning@sh.itjust.works

0 comments fedilink

MPT-30B-Chat - a Hugging Face Space by mosaicml (lemmy.intai.tech)

submitted 1 year ago by manitcor@lemmy.intai.tech to c/learnmachinelearning@sh.itjust.works

0 comments fedilink

cross-posted from: https://lemmy.intai.tech/post/17993

https://huggingface.co/spaces/mosaicml/mpt-30b-chat

101 fundamentals for aspiring the model makers (lemmy.intai.tech)

submitted 1 year ago by manitcor@lemmy.intai.tech to c/learnmachinelearning@sh.itjust.works

0 comments fedilink

cross-posted from: https://lemmy.intai.tech/post/18067

https://twitter.com/FrnkNlsn/status/1520585408215924736

https://www.researchgate.net/publication/327304999_An_Elementary_Introduction_to_Information_Geometry

https://www.researchgate.net/publication/357097879_The_Many_Faces_of_Information_Geometry

https://franknielsen.github.io/IG/index.html

https://franknielsen.github.io/GSI/

https://www.youtube.com/watch?v=w6r_jsEBlgU&embeds_referring_euri=https%3A%2F%2Ftwitter.com%2F&source_ve_path=MjM4NTE&feature=emb_title

[Resource] Good collection of introductions to topics for stats and machine learning: Nature Methods' Points of Significance (www.nature.com)

submitted 1 year ago by ShadowAether@sh.itjust.works to c/learnmachinelearning@sh.itjust.works

0 comments fedilink

From Nature.com - Statistics for Biologists. A series of short articles that are a nice introduction to several topics and because the audience is biologists, the articles are light on math/equations.

[Discussion] What are your favourite tools for searching for info and why? (sh.itjust.works)

submitted 1 year ago by ShadowAether@sh.itjust.works to c/learnmachinelearning@sh.itjust.works

9 comments fedilink

Bing, ChatGPT, etc.

On sourcing for benchmark datasets: Will the Real Iris Data Please Stand Up? (lucykuncheva.co.uk)

submitted 1 year ago by ShadowAether@sh.itjust.works to c/learnmachinelearning@sh.itjust.works

0 comments fedilink

This paper highlights an issue that many people don't think about. Fyi when trying to compare or reproduce results, always try to get the dataset from the same source as the original author and scale it in the same way. Unfortunately, many authors assume the scaling is obvious and don't include it but changes in scaling can lead to very different results.

[Repost Q] Fast python convolution along one axis only (sh.itjust.works)

submitted 1 year ago* (last edited 1 year ago) by ShadowAether@sh.itjust.works to c/learnmachinelearning@sh.itjust.works

2 comments fedilink

Not OP. This question is being reposted to preserve technical content removed from elsewhere. Feel free to add your own answers/discussion.

I have two 2-D arrays with the same first axis dimensions. In python, I would like to convolve the two matrices along the second axis only. I would like to get C below without computing the convolution along the first axis as well.

import numpy as np
import scipy.signal as sg

M, N, P = 4, 10, 20
A = np.random.randn(M, N)
B = np.random.randn(M, P)

C = sg.convolve(A, B, 'full')[(2*M-1)/2]

Is there a fast way?

[Repost Q] Need help understanding this code on power consumption prediction using LSTM (sh.itjust.works)

submitted 1 year ago* (last edited 1 year ago) by ShadowAether@sh.itjust.works to c/learnmachinelearning@sh.itjust.works

1 comments fedilink

Not OP. This question is being reposted to preserve technical content removed from elsewhere. Feel free to add your own answers/discussion.

Original question: Hello everyone!! I am a final year undergrad(electrical engineering) student who just dabbled in machine learning. During our 1 month training period, I chose to do a course in ML/AI and am now seeking to build a project, mainly on electrical energy consumption prediction. I came across this cool code/project(in the link), but cannot understand a word of it ;( Please if anyone of you could spare me some time and explain this code to me..I'll be grateful to you.

What project do you consider good for a beginner, that I can easily explain to others too?? Do you have any ideas?

[Resource] Deep learning: the unintuitive relationship between overparameterization, overfitting and generalization (brain.harvard.edu)

submitted 1 year ago* (last edited 1 year ago) by ShadowAether@sh.itjust.works to c/learnmachinelearning@sh.itjust.works

0 comments fedilink

See also: the phenomenon of double descent.

[Resource] What's the kernel trick? (explanation) (eranraviv.com)

submitted 1 year ago* (last edited 1 year ago) by ShadowAether@sh.itjust.works to c/learnmachinelearning@sh.itjust.works

0 comments fedilink

A nice visualization/example of the kernel trick. A more mathematical explanation can be found here.

[Repost Q] Are autoencoders and auto-associative neural networks the same thing? (sh.itjust.works)

submitted 1 year ago* (last edited 1 year ago) by ShadowAether@sh.itjust.works to c/learnmachinelearning@sh.itjust.works

1 comments fedilink

This question is being reposted to preserve technical content removed from elsewhere. Feel free to add your own answers/discussion.

Original question: Autoencoders and auto-associative memory seem to be closely related. It appears the terminology changed, is there a difference between the two or did the wording simply change over time?

[Repost Q] Are there commonly used shapes for testing 4D+ classifiers? (such as half moons for 2D) (www.researchgate.net)

submitted 1 year ago* (last edited 1 year ago) by ShadowAether@sh.itjust.works to c/learnmachinelearning@sh.itjust.works

2 comments fedilink

This question is being reposted to preserve technical content removed from elsewhere. Feel free to add your own answers/discussion.

Original question: Using interleaving half circles or circle inside another circle seem common for 2D, are there similar shapes people use for higher dimensions? Maybe a hypersphere inside a hypersphere? I was thinking of trying to do something more complicated than a bunch of blobs.

[Resource] A good introduction/overview to neural network theory: Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville (Free, online copy) (www.deeplearningbook.org)

submitted 1 year ago* (last edited 1 year ago) by ShadowAether@sh.itjust.works to c/learnmachinelearning@sh.itjust.works

0 comments fedilink

I found this book a very good reference when learning about autoencoders

[Repost Q] Should I scale values before using them to train autoencoder? (sh.itjust.works)

submitted 1 year ago* (last edited 1 year ago) by ShadowAether@sh.itjust.works to c/learnmachinelearning@sh.itjust.works

12 comments fedilink

Not OP. This question is being reposted to preserve technical content removed from elsewhere. Feel free to add your own answers/discussion.

Original question:

I have a dataset that contains vectors of shape 1xN where N is the number of features. For each value, there is a float between -4 and 5. For my project I need to make an autoencoder, however, activation functions like ReLU or tanh will either only allow positive values through the layers or within -1 and 1. My concern is that upon decoding from the latent space the data will not be represented in the same way, I will either get vectors with positive values only or constrained negative values while I want it to be close to the original.

Should I apply some kind of transformation like adding a positive constant value, exp() or raise data to power 2, train VAE, and then if I want original representation I just log() or log2() the output? Or am I missing some configuration with activation functions that can give me an output similar to the original input?

[Repost Q] Methods describe the temporal consistency of kernel density data (i.stack.imgur.com)

submitted 1 year ago* (last edited 1 year ago) by ShadowAether@sh.itjust.works to c/learnmachinelearning@sh.itjust.works

0 comments fedilink

Not OP. This question is being reposted to preserve technical content removed from elsewhere. Feel free to add your own answers/discussion.

Original question:

I am working on a spatial time series analysis project. The task is to study the spatial distribution of point features (e.g., crime events, traffic accidents) over time. I aim to find the places with the following characteristics given the spatial distribution of point features across time:

places with consistently high-level concentration of point features
places with periodically high-level concentration of point features. "periodic" here might mean that this place only has a great number of point features during special events (e.g., ceremony and the national day)
places with suddenly high-level concentration of point features

I have used the Kernel Density Estimation method to compute the density of places across the study area through the study timeline. This way, I can get the time series of kernel densities for each location on the map, i.e., a matrix in which rows represent locations and columns denote the time. Then what's next? How can I statistically find places with a large number of point features but different temporal consistency levels over time? For instance, the following figure shows the spatial distribution of kernel densities of locations in New York City for four continuous periods (in total I have about 15 periods). The red color means the high kernel densities while the green color represents the low kernel densities.

I have tried to use the clustering techniques (e.g., KMeans and KShape) offered by tslearn package in Python to cluster time series of kernel density values of all the locations. But I can only differentiate them somehow visually. Are their any statistical methods to achieve this goal?

[Repost Q] Request feedback on my unet training: is this underfitting or overfitting? (i.stack.imgur.com)

submitted 1 year ago* (last edited 1 year ago) by ShadowAether@sh.itjust.works to c/learnmachinelearning@sh.itjust.works

0 comments fedilink

Not OP. This question is being reposted to preserve technical content removed from elsewhere. Feel free to add your own answers/discussion.

Original question:

The more I read, the more I am confused as to how to interpret the validation and training loss graphs, so therefore I would like to ask for some guidance on how to interpret these values here in the picture. I am training a basic UNet architecture. I am now wondering if I need a more complex network model, or that I just need more data to improve the accuracy.

Historical note: I had the issue where validation loss was exploding after a few epochs, but I added dropout layers and that seems to have fixed the situation.

My current interpretation is that the validation loss is slowly increasing, so does that mean that it's useless to train further? Or should I rather let it train further because the validation accuracy seems to sometimes jump up a little bit?