Learn Machine Learning

484 readers
1 users here now

Welcome! This is a place for people to learn more about machine learning techniques, discuss applications and ask questions.

Example questions:

Please do:

Please don't:

Other communities in this area:

Similar subreddits: r/MLquestions, r/askmachinelearning, r/learnmachinelearning

founded 1 year ago
MODERATORS
26
 
 
27
28
29
30
31
 
 

cross-posted from: https://lemmy.intai.tech/post/40699

Models

Datasets

Repos

Related Papers

Credit:

Tweet

Archive:

@Yampeleg The first model to beat 100% of ChatGPT-3.5 Available on Huggingface

๐Ÿ”ฅ OpenChat_8192

๐Ÿ”ฅ 105.7% of ChatGPT (Vicuna GPT-4 Benchmark)

Less than a month ago the world witnessed as ORCA [1] became the first model to ever outpace ChatGPT on Vicuna's benchmark.

Today, the race to replicate these results open-source comes to an end.

Minutes ago OpenChat scored 105.7% of ChatGPT.

But wait! There is more!

Not only OpenChat beated Vicuna's benchmark, it did so pulling off a LIMA [2] move!

Training was done using 6K GPT-4 conversations out of the ~90K ShareGPT conversations.

The model comes in three versions: the basic OpenChat model, OpenChat-8192 and OpenCoderPlus (Code generation: 102.5% ChatGPT)

This is a significant achievement considering that it's the first (released) open-source model to surpass the Vicuna benchmark. ๐ŸŽ‰๐ŸŽ‰

Congratulations to the authors!!


[1] - Orca: The first model to cross 100% of ChatGPT: https://arxiv.org/pdf/2306.02707.pdf [2] - LIMA: Less Is More for Alignment - TL;DR: Using small number of VERY high quality samples (1000 in the paper) can be as powerful as much larger datasets: https://arxiv.org/pdf/2305.11206

32
33
34
35
36
37
38
 
 

From Nature.com - Statistics for Biologists. A series of short articles that are a nice introduction to several topics and because the audience is biologists, the articles are light on math/equations.

39
 
 

Bing, ChatGPT, etc.

40
 
 

This paper highlights an issue that many people don't think about. Fyi when trying to compare or reproduce results, always try to get the dataset from the same source as the original author and scale it in the same way. Unfortunately, many authors assume the scaling is obvious and don't include it but changes in scaling can lead to very different results.

41
 
 

Not OP. This question is being reposted to preserve technical content removed from elsewhere. Feel free to add your own answers/discussion.

I have two 2-D arrays with the same first axis dimensions. In python, I would like to convolve the two matrices along the second axis only. I would like to get C below without computing the convolution along the first axis as well.

import numpy as np
import scipy.signal as sg

M, N, P = 4, 10, 20
A = np.random.randn(M, N)
B = np.random.randn(M, P)

C = sg.convolve(A, B, 'full')[(2*M-1)/2]

Is there a fast way?

42
 
 

Not OP. This question is being reposted to preserve technical content removed from elsewhere. Feel free to add your own answers/discussion.

Original question: Hello everyone!! I am a final year undergrad(electrical engineering) student who just dabbled in machine learning. During our 1 month training period, I chose to do a course in ML/AI and am now seeking to build a project, mainly on electrical energy consumption prediction. I came across this cool code/project(in the link), but cannot understand a word of it ;( Please if anyone of you could spare me some time and explain this code to me..I'll be grateful to you.

What project do you consider good for a beginner, that I can easily explain to others too?? Do you have any ideas?

43
 
 

See also: the phenomenon of double descent.

44
 
 

A nice visualization/example of the kernel trick. A more mathematical explanation can be found here.

45
 
 

This question is being reposted to preserve technical content removed from elsewhere. Feel free to add your own answers/discussion.

Original question: Autoencoders and auto-associative memory seem to be closely related. It appears the terminology changed, is there a difference between the two or did the wording simply change over time?

46
 
 

This question is being reposted to preserve technical content removed from elsewhere. Feel free to add your own answers/discussion.

Original question: Using interleaving half circles or circle inside another circle seem common for 2D, are there similar shapes people use for higher dimensions? Maybe a hypersphere inside a hypersphere? I was thinking of trying to do something more complicated than a bunch of blobs.

47
 
 

I found this book a very good reference when learning about autoencoders

48
 
 

Not OP. This question is being reposted to preserve technical content removed from elsewhere. Feel free to add your own answers/discussion.

Original question:

I have a dataset that contains vectors of shape 1xN where N is the number of features. For each value, there is a float between -4 and 5. For my project I need to make an autoencoder, however, activation functions like ReLU or tanh will either only allow positive values through the layers or within -1 and 1. My concern is that upon decoding from the latent space the data will not be represented in the same way, I will either get vectors with positive values only or constrained negative values while I want it to be close to the original.

Should I apply some kind of transformation like adding a positive constant value, exp() or raise data to power 2, train VAE, and then if I want original representation I just log() or log2() the output? Or am I missing some configuration with activation functions that can give me an output similar to the original input?

49
 
 

Not OP. This question is being reposted to preserve technical content removed from elsewhere. Feel free to add your own answers/discussion.

Original question:

I am working on a spatial time series analysis project. The task is to study the spatial distribution of point features (e.g., crime events, traffic accidents) over time. I aim to find the places with the following characteristics given the spatial distribution of point features across time:

  • places with consistently high-level concentration of point features
  • places with periodically high-level concentration of point features. "periodic" here might mean that this place only has a great number of point features during special events (e.g., ceremony and the national day)
  • places with suddenly high-level concentration of point features

I have used the Kernel Density Estimation method to compute the density of places across the study area through the study timeline. This way, I can get the time series of kernel densities for each location on the map, i.e., a matrix in which rows represent locations and columns denote the time. Then what's next? How can I statistically find places with a large number of point features but different temporal consistency levels over time? For instance, the following figure shows the spatial distribution of kernel densities of locations in New York City for four continuous periods (in total I have about 15 periods). The red color means the high kernel densities while the green color represents the low kernel densities.

I have tried to use the clustering techniques (e.g., KMeans and KShape) offered by tslearn package in Python to cluster time series of kernel density values of all the locations. But I can only differentiate them somehow visually. Are their any statistical methods to achieve this goal?

50
 
 

Not OP. This question is being reposted to preserve technical content removed from elsewhere. Feel free to add your own answers/discussion.

Original question:

The more I read, the more I am confused as to how to interpret the validation and training loss graphs, so therefore I would like to ask for some guidance on how to interpret these values here in the picture. I am training a basic UNet architecture. I am now wondering if I need a more complex network model, or that I just need more data to improve the accuracy.

Historical note: I had the issue where validation loss was exploding after a few epochs, but I added dropout layers and that seems to have fixed the situation.

My current interpretation is that the validation loss is slowly increasing, so does that mean that it's useless to train further? Or should I rather let it train further because the validation accuracy seems to sometimes jump up a little bit?

view more: โ€น prev next โ€บ