• 0 Posts
  • 12 Comments
Joined 1 year ago
cake
Cake day: December 8th, 2024

help-circle






  • Fiery@lemmy.dbzer0.comto196@lemmy.blahaj.zoneMythos rule
    link
    fedilink
    English
    arrow-up
    2
    ·
    12 days ago

    The best measure is indeed the final impact of these systems. However that is very hard to actually measure properly, and doesn’t completely make benchmarks useless. Benchmarks are still good data points (if they’re designed well) to measure advances in the technology. If a model failed to do a realistic task before and the next gen can do it, that often translates to a real improvement to impact. Though having a benchmark improve x2 doesn’t mean the model will have x2 impact.

    A benchmark can be run automatically and often, while real impact studies take time.

    In software development the best measure for quality is the end user having no issues, that doesn’t mean automated testing (unit/integration/end-to-end) suddenly is irrelevant though.






  • Real funny they coloured it differently, because Flanders literally shares a language with The Netherlands.

    To be fair half the world seems to forget Belgium is not all french sometimes, or puts french as the default even though Flanders’ population is almost twice as large as Wallonie. Even adding the population of Brussels and Wallonie, Flanders still has the larger population. (Numbers for stats come from statbel)