Tetromino Pixel
"Tetromino Pixel," a term encompassing various research directions, broadly focuses on leveraging pixel-level information from images and videos to achieve higher-level tasks. Current research emphasizes using deep learning models, including transformers, U-Nets, and diffusion models, to process visual data and integrate it with other modalities like text and 3D point clouds for applications such as image captioning, object detection, 3D reconstruction, and robotic control. This work is significant for advancing multimodal AI, improving the efficiency and interpretability of computer vision systems, and enabling new capabilities in areas like autonomous navigation and medical image analysis.
Papers
July 27, 2022
July 26, 2022
July 14, 2022
July 3, 2022
June 13, 2022
June 10, 2022
June 8, 2022
June 4, 2022
May 23, 2022
April 11, 2022
April 3, 2022
April 2, 2022
March 23, 2022
March 7, 2022
March 2, 2022
February 4, 2022
December 21, 2021
December 17, 2021