Model Evaluation - Search News

A CIO's And CDO’s Playbook For Operationalizing Generative AI

The transition from pilot projects to enterprise-scale impact demands more than flashy demos or isolated proofs of concept.

Catholic News Agency

Anti-assisted-suicide group says suicide laws expanding throughout U.S. in 2025

Patients Rights Action Fund coalitions director Jessica Rodgers explained that most states that allow assisted suicide follow ...

2 No-Brainer Artificial Intelligence (AI) Stocks to Buy Right Now

Investments related to artificial intelligence (AI) tend to attract considerable interest and investor returns. As this technology offers new innovations and transforms existing industries, these ...

ExecutiveGov

MITRE, FAA Launch Aerospace LLM Evaluation Benchmark

MITRE said the ALUE benchmark for aerospace LLM evaluation supports custom datasets, open-source LLMs and user-defined prompts.

Oncology Nurse Advisor

Specialized Urgent Care for Oncology Patients Improves Outcomes, Reduces Admissions

The creation of an urgent care department specifically for oncology patients ensured continuity of care for these patients, particularly those receiving outpatient care.

FedScoop

USAi tool lets agencies test for AI biases, GSA official says

GSA launched the USAi.gov site last month, giving federal agencies the ability to test leading AI models before procuring ...

Columbia Journalism Review

Journalists Need Their Own Benchmark Tests for AI Tools

“Most people who use AI for science seem content to allow the developers of AI tools to evaluate their usefulness using their ...

Best of 2026 vehicles list kicks off NACTOY automotive awards season

The North American Car, Truck and Utility Vehicle of the year jury revealed the 30 candidates for the awards Wednesday at Michigan Central.

The Daily Reflector

MITRE and FAA Introduce Novel Aerospace Large Language Model Evaluation Benchmark

The Federal Aviation Administration (FAA) and MITRE are introducing a new benchmark to enable the evaluation and assessment of large language models (LLMs) for aerospace tasks. Given the ...

AI models know when they're being tested - and change their behavior, research shows

New joint safety testing from UK-based nonprofit Apollo Research and OpenAI set out to reduce secretive behaviors like scheming in AI models. What researchers found could complicate promising ...

Scientific Research Publishing

A Review of Agent Data Evaluation: Status, Challenges, and Future Prospects as of 2025 ()

Wang, S. (2025) A Review of Agent Data Evaluation: Status, Challenges, and Future Prospects as of 2025. Journal of Software ...

BMJ Evidence-Based Medicine

Hidden risks of predictive models in healthcare

Joseph Alderman et al argue that predictive models in healthcare lack adequate oversight and regulation. They highlight the ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results