Tetromino Pixel
"Tetromino Pixel," a term encompassing various research directions, broadly focuses on leveraging pixel-level information from images and videos to achieve higher-level tasks. Current research emphasizes using deep learning models, including transformers, U-Nets, and diffusion models, to process visual data and integrate it with other modalities like text and 3D point clouds for applications such as image captioning, object detection, 3D reconstruction, and robotic control. This work is significant for advancing multimodal AI, improving the efficiency and interpretability of computer vision systems, and enabling new capabilities in areas like autonomous navigation and medical image analysis.
Papers
March 17, 2024
March 11, 2024
February 21, 2024
January 30, 2024
January 18, 2024
January 10, 2024
January 8, 2024
January 4, 2024
January 3, 2024
December 31, 2023
December 15, 2023
December 4, 2023
December 2, 2023
November 29, 2023
November 27, 2023
November 22, 2023
November 6, 2023
November 2, 2023