Using Telegram as infinite cloud storage

14 Feb, 2024

Transforming Telegram into an E2EE infinite cloud storage

programmingrust

17/08/24 Edit: I decided to rewrite this post because it was very technical and difficult to understand.

A while ago I was backing up my homelab and the total size was about 300 GB, which was no problem to store locally, but I still needed an external backup. So I started looking for a cloud storage provider, but they are too expensive for sporadic things like huge backups, and there is also the issue of trust.

While researching, I remembered a project a friend told me about a long time ago called Teledrive, a project for using Telegram to store files. But this project is dead and I was felling like doing some tinkering

How that works?

Telegram has 2 APIs, one for bots and one for clients. The bot API is usually easier to use, but has more limitations, with the limit of 20 MB for downloads and 50 MB for upload being the most impactful, and when we talk about sending and downloading files they are very important. On the other hand the API for clients has only a 2 GB per file limit.

With that in mind I decided to write some code in rust to do a proof-of-concept.

Connecting to Telegram

The first and most important step is to figure out how to connect to Telegram as a client, and for that we can use grammers, a very powerfull and simple to use library to interact with the Telegram API (both for bots and clients). Now that we have everything we can start writing some code.

Receiving the OTP code

Since we are using the client’s API, we will need to authenticate with a phone number and then confirm with a code. For that let’s create first a simple function that prompts a message to the user:

use std::io::{self, BufRead, Write};

fn prompt(message: &str) -> io::Result<String> {
    let mut stdout = io::stdout().lock();

    stdout.write_all(message.as_bytes())?;
    stdout.flush()?;

    let mut stdin = io::stdin().lock();
    let mut line = String::new();

    stdin.read_line(&mut line)?;

    Ok(line)
}

From grammers examples/downloader.rs

Ok done, now we can start the login step:

// Dependencies
use grammers_client::{Client, Config, SignInError};
use grammers_session::Session;

To persist the session, we can store it in a file to retrieve later:

const SESSION_FILE: &str = "/path/to/telegram.session";

For everything to work, you will need to use a async runtime. At the time I was using actix but you could replace it with regular a regular #[tokio::main]

#[actix_web::main]
async fn main() -> std::io::Result<()> {
    // ...
}

Now we can initialize our client:

let telegram_client = Client::connect(Config {
    session: Session::load_file_or_create(SESSION_FILE)?,
    api_id: env::var("API_ID")
        .expect("API_ID not set")
        .parse()
        .expect("Failed to parse API_ID"),
    api_hash: env::var("API_HASH")
        .expect("API_HASH not set")
        .parse()
        .expect("Failed to parse API_HASH"),
    params: Default::default(),
}).await.unwrap();

And then sign in:

// Since we are using a file to persist the session, there is no need to login twice right?
if !telegram_client.is_authorized().await.unwrap() {
    println!("Signing in");
    let phone_number = prompt("Enter your phone number: ")?;
    let token = telegram_client
        .request_login_code(&phone_number)
        .await
        .unwrap();
    let code = prompt("Enter the code that you received: ")?;
    let signed_in = telegram_client.sign_in(&token, &code).await;

    match signed_in {
        Err(SignInError::PasswordRequired(pass_token)) => {
            let hint = pass_token.hint().unwrap();
            let prompt_msg = format!("Enter your password (hint {}): ", &hint);
            let password = prompt(prompt_msg.as_str())?;

            telegram_client
                .check_password(pass_token, password.trim())
                .await
                .unwrap();
        }
        Ok(_) => (),
        Err(err) => panic!("{}", err),
    }
}

Encrypting files

Another part of the idea is end-to-end encryption, and for this I will use orion to create some helper functions that will be used later:

use orion::aead;
use sha2::{Digest, Sha256};
use std::env;

fn encrypt(data: &[u8]) -> Vec<u8> {
    let mut hasher = Sha256::new();
    hasher.update(
        env::var("ENCRYPTION_KEY")
            .expect("Missing environment variable ENCRYPTION_KEY")
            .as_bytes(),
    );
    let pass_hash = hasher.finalize();
    let key = aead::SecretKey::from_slice(&pass_hash).unwrap();

    aead::seal(&key, data).unwrap()
}

fn decrypt(data: &[u8]) -> Vec<u8> {
    let mut hasher = Sha256::new();
    hasher.update(
        env::var("ENCRYPTION_KEY")
            .expect("Missing environment variable ENCRYPTION_KEY")
            .as_bytes(),
    );
    let pass_hash = hasher.finalize();
    let key = aead::SecretKey::from_slice(&pass_hash).unwrap();

    aead::open(&key, data).unwrap()
}

Is there a better way to handle this? Probably, but for now I will use it this way.

Sending messages

Now that we are connected and have the functions to encrypt, its time to actually send some messages:

// dependencies
use grammers_client::{
    client::auth::InvocationError,
    types::{Downloadable, Message},
    Client, InputMessage,
};
use grammers_session::{PackedChat, PackedType};
use std::{env, sync::Arc};
use tokio::sync::Mutex;

// This is only required when using it with actix
type TelegramClient = Arc<Mutex<Client>>;

To send a message, we need to specify a PackedChat struct for the send_message function. As it is very annoying to have to type everything over and over again, let’s create a helper function to return the correct struct:

fn message_packet() -> PackedChat {
    PackedChat {
        ty: PackedType::Chat,
        id: env::var("GROUP_ID")
            .expect("Missing environment variable GROUP_ID")
            .parse()
            .expect("Failed to parse GROUP_ID to i64"),
        access_hash: None,
    }
}

Now let’s create a function to send an empty message with a file:

async fn send_message_with_document(
    client: TelegramClient,
    filepath: &String,
) -> Result<Message, InvocationError> {
    let client = client.lock().await;
    let uploaded_file = (*client).upload_file(filepath).await.unwrap();
    (*client)
        .send_message(
            message_packet(),
            InputMessage::text("").document(uploaded_file),
        )
        .await
}

And to download the file later it is very simple:

async fn download_message_document(
    client: TelegramClient,
    message_id: i32,
    output_path: String,
) {
    let client = client.lock().await;
    let message = &(*client)
        .get_messages_by_id(message_packet(), &[message_id])
        .await
        .unwrap()[0];
    // No need to loop
    // There is only one message with the id

    match message {
        Some(msg) => {
            if let Some(media) = msg.media() {
                (*client)
                    .download_media(&Downloadable::Media(media), output_path)
                    .await
                    .expect("Failed to download file")
            }
        }
        None => {}
    }
}

Splitting the files into chunks

When the file is bigger than our upload limit or for faster upload/downloads, we will need to split the file into smaller chunks. For that we just need to open the file, split the bytes and then upload the chunks, simple enough right?

Not quite, we are talking of larger files over 2 GB. With that approach the file would need to be loaded in RAM first that could cause OOM to kill the process. And to solve this we will use mmap and memory mapped files, allowing us to read files without loading them entire into RAM, at the cost of more disk usage.

use memmap::Mmap;
use std::{fs, path::PathBuf};

// ~ 200 MB
const TARGET_CHUNK_SIZE: i64 = 209715200;
// 2GB
const TELEGRAM_MAX_FILESIZE: u64 = 2000000000;

async fn parse(...) {
    let file = fs::File::open(&filepath).unwrap();
    let filesize = file.metadata().unwrap().len();

    // If the file is smaller than 2 GB
    // we can just encrypt and upload
    if filesize < TELEGRAM_MAX_FILESIZE {
        let mmap_file = unsafe { Mmap::map(&file).unwrap() };
        let encrypted = encryption::encrypt(&mmap_file);

        // Send and store metadata about the file
        // in a database
        // ...
    } else {
        // If the file is bigger than 2GB
        // we will need to split the file into smaller chunks.
        let mmap_file = unsafe { Mmap::map(&file).unwrap() };
        let filesize = filesize as i64;
        // Calculate the amount of chunks based on the
        // target chunk size.
        // Smaller TARGET_CHUNK_SIZE equals more chunks
        let num_chunks = (filesize + TARGET_CHUNK_SIZE - 1) / TARGET_CHUNK_SIZE;
        let chunksize = (filesize + num_chunks - 1) / num_chunks;
        // Split the contents in chunks
        let chunks = mmap_file.chunks(chunksize as usize).collect::<Vec<&[u8]>>();

        for (index, chunk) in chunks.iter().enumerate() {
            let encrypted = encryption::encrypt(chunk);

            // Again, send and store metadata about the file
        }
    }
}

To assemble the file later you just need to do the opposite: download the chunks, loop over them and write the content (in order) into a single file.

Conclusion

This post is just a very basic showcase of the concept of a solution that is not perfect, download and upload speeds are slow, mostly because of Telegram being slow, and (again) there is also the issue of trust. There is also already a good project if you are looking for storing files on Telegram called Teldrive.

Overall, this is just a experiment that I made to try some new things and maybe learn something in the process.

See ya!

go up?