x

Bacalhau Project

Data. Transformed.

Bacalhau transforms Big Data processing by giving developers simple, low cost, “Distributed First” tools that unlock a new collaborative ecosystem.

Why Bacalhau?

Bacalhau seeks to address deep rooted gaps in the research community

  • Expensive and slow, particularly with large datasets
  • Complicated that require significant rewriting to use
  • Difficult to share and collaborate with other orginazations

Bacalhau transforms Big Data processing by giving developers simple, low cost, decentralized tools that unlock a new collaborative ecosystem

  • Fast and reliable - the network runs the jobs where the data is already stored
  • Simple and familiar - use the tools you already know and love (Docker, GNU, Python, R, Matlab, WASM)
  • Collaborative - All content can be shared using the globally distributed IPFS network

How does it work?

Bacalahau is a network of open compute resources made available to serve any data processing workload.

Bacalhau enables users to run arbitrary docker containers and wasm images as tasks against data stored in IPFS. This architecture is referred to as Compute Over Data (or COD). The Portuguese word for salted Cod fish is "Bacalhau" which is the origin of the project's name.

Bacalhau operates as a peer-to-peer network of nodes where each node participates in executing (computing) jobs submitted to the cluster. Bacalhau CLI requests are sent to nodes in the cluster, which then broadcasts messages over the transport layer to other nodes in the cluster.

Submitting Jobs is Easy

$ bacalhau run ubuntu echo hello

$ bacalhau list --wide --sort-by=id --id-filter=<JOB_ID>

$ ipfs get <RESULT_CID>