Local files and screenshots for the repo update.
Benchmark pass rate
Tool-use harness pass rate
Coding harness pass rate
Memory harness pass rate
benchmark: 4 cases harnesses: reasoning 60%, coding 100%, tool_use 100%, memory 100% commit pushed to GitHub