* improve consecutive Tensor indexing instead of O(idx_counts*src_dims), it can just be O(idx_counts) * test correctness