* test asm_gemm in CI * default float16 * use a smaller shape for multi * smaller size * smaller for CI * smaller for ci * need half