Tetromino Pixel
"Tetromino Pixel," a term encompassing various research directions, broadly focuses on leveraging pixel-level information from images and videos to achieve higher-level tasks. Current research emphasizes using deep learning models, including transformers, U-Nets, and diffusion models, to process visual data and integrate it with other modalities like text and 3D point clouds for applications such as image captioning, object detection, 3D reconstruction, and robotic control. This work is significant for advancing multimodal AI, improving the efficiency and interpretability of computer vision systems, and enabling new capabilities in areas like autonomous navigation and medical image analysis.
Papers
September 27, 2023
September 18, 2023
August 30, 2023
August 1, 2023
June 24, 2023
June 15, 2023
June 1, 2023
May 31, 2023
May 29, 2023
May 22, 2023
May 15, 2023
April 27, 2023
April 24, 2023
April 14, 2023
March 20, 2023
March 16, 2023
February 28, 2023
February 17, 2023
January 11, 2023