📢
24
c/ai-innovations•james533james533•11d ago

Question about AI model training data quality

I keep seeing people in my local tech meetup talk about training models on any data they can scrape, focusing only on volume. Last month, a developer from Austin showed a project that failed because the training set was full of duplicate and low-quality forum posts. The model just repeated nonsense. I think clean, verified data matters more than sheer size. Has anyone else run into this and found a good way to source better datasets?
2 comments

Log in to join the discussion

Log In
2 Comments
michael895
michael89511d ago
Wait, they just used any forum posts they could find? I mean, that's basically asking for a model to just spit back garbage.
3
river_gonzalez66
Isn't it more about how they filter and clean the data first?
1