Data Engine

đźš‚ Data Engine

Data Engine HLD internally at metaforms.ai

Hypothesis:

  • Unknown unknowns: Dataset is always imperfect, all scenarios are not represented well yet and can always be more diverse
  • Capable base model/architecture: Improving dataset improves AI/product guarantees

Inspirations:

{% embed url="https://www.youtube.com/watch?v=zPH5O8hRfMA" %}

"The only sure certain way I have seen of making progress on any task is, you curate the dataset that is clean and varied and you grow it and you pay the labeling cost and I know that works.”

"Potentially nitpicky but competitive advantage in AI goes not so much to those with data but those with a data engine. And whoever can spin it fastest. Slide from Tesla to ~illustrate but concept is general”

QualEval: Qualitative Evaluation for Model Improvement

{% embed url="https://x.com/georgejrjrjr/status/1729996423457091731?s=20" %}

https://medium.com/swlh/about-the-long-tail-113e98ce8717