Fascinating read on the economics of publishing. I have highlighted the relevant aspects, of course. I haven’t explored the different models to generate revenue.
Comments and suggestions are welcome.
(Fun fact: SciHub isn’t a “pirate” website but a reminder to various publishers that their business model is utterly broken)
Publishing scholarly work is more complex than simply hosting PDFs. While we could all choose to host un-reviewed content on our personal websites, there is value added by publishing with a professional organisation. A publisher should provide three things: discoverability, availability, and persistence. Providing these costs money in terms of staff, volunteer support, infrastructure, systems, servers, and bandwidth.
DISCOVERABILITY – CONTENT MUST BE DISCOVERABLE
Making content discoverable requires not only software and hosting, but also paid services like DOIs, and strategic planning and enforcement of standards around metadata, library structure, visual identity, and classification. Organisations like ACM and IEEE achieve this through professional staff who prepare author submitted content for their digital libraries, assign DOIs, verify content and metadata, and manage document release operations. arXiv follows a different model, where this work is distributed to authors and volunteer subject moderators, and some paid services (like DOIs) are excluded and arXiv specific identifiers are used instead.
AVAILABILITY – CONTENT MUST BE AVAILABLE FOR DOWNLOAD
Serving articles has a material expense in terms of bandwidth and storage. How articles are made “available” varies significantly. Organisations like ACM and IEEE put the majority of their content behind a paywall by default, although both have a range of options for open access publication. arXiv makes all content freely available by default, but requests annual membership fees based on usage. USENIX follows a different model, where conference budgets must generate enough revenue to cover the cost of gold open access publishing.
PERSISTENCE – CONTENT MUST BE PRESERVED
It is also important that content is published in resilient formats that will endure beyond the lifetime of the publishing organisation. ACM has recently transitioned to an XML based JATS/BITS archival format and from 2020 onwards publications are no longer coupled to reading formats like PDFs. This future-proofs publications, ensuring content can be rendered in PDFs, HTML, ePub, and formats that have not yet been imagined. This initiative required significant coordinated investment in staff, volunteer support, software, and infrastructure. There are also long-term fail-safe preservation costs to ensure the availability of content in the event that ACM’s platforms fail or ACM ceases to exist through services like Portico.